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(54) Efficient directional genetic cloning system 

(57) A highly efficient genetic cloning system is dis- 
closed which is particularly useful for cloning cDNA cop- 
ies of eukaryotic mRNAS and can direct the orientation 
of inserts in plasmid composite vectors with large clon- 
ing capacities. Cleavage of such vector DNA. by the 
restriction enzyme Sfil. for example, creates two differ- 
ent non-symmetrical 3' extensions at the ends of vector 
DNA. Using a linker-primer and an adaptor. cDNA is 
prepared to have two different sticky ends which can be 
ligated to those of the vector. When the cDNA frag- 
ments and the vector DNAs are mixed, both the mole- 
cules can assemble without self-circularization due to 
base-pairing specificity. This system provides (1) high 
cloning efficiency (10^-10^ clones/g poly (A)+ RNA), (2) 
low background (more than 90% of the clones contain 
inserts), (3) directional insertion of cDNA fragments into 
the vectors. (4) presence of a single insert in each 
clone, (5) accommodation of long inserts (up to lOkb), 
(6) a mechanism for rescue of the plasmid part from a k 
genome, and (7) a straightfonward protocol for library 
preparation. 
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Description 

BACKGROUND OF THE INVENTION 

5 FIELD OF THE iNVENTiON 

The present invention relates to vectors for molecular cloning of DNA segments, particularly to cloning vectors 
mploying non-symmetrical restriction enzyme recognition sites for insertion of DNA segments in a defined direction 
relative to vector. This invention also relates to use of such vectors in methods for efficient cloning of genomic DNA seg- 
10 ments and of complementary DNA (cDNA) copies of messenger RNA (mRNA) molecules from eukaryotic genes, and 
to the manufacture and use of novel products related to these vectors and methods. 

BACKGROUND OF THE iNVENTION 

15 The development of DNA cloning techniques for complementary DNA (cDNA) copies of messenger RNA (mRNA) 
molecules has been of great value in the study of eukaryotic genes. In many cases, the amount of a given mRNA for 
which cDNA clones are desired is limited by the availability of appropriate tissue sources and/or a low concentration of 
that specific mRNA in those sources. Therefore, readily obtainable sources may provide only a few copies of a given 
mRNA molecule from which cDNA clones might be produced. 

20 The requirements for any efficient method for cDNA cloning may be generally summarized as follows: first, full- 
length double-stranded cDNAs must be produced from the mRNA with high yield; the ends of the resulting DNA frag- 
ments must be made capable of being joined efficiently to the vector DNA by enzymatic ligation; production of undesir- 
able ligation byproducts must be minimized; and. preferably, insertion of the cDNA into tiie vector DNA should provide 
expression of the cDNA to facilitate detection of the desired clone by means of the product. 

25 Production of the protein product may be necessary for detecting a gene when no nucleic acid probes for tine 
desired gene are available. More generally, such expression of the protein is desirable because, in terms of copy 
number, tiie protein provides a molecular signal that is greatly amplified in relation to the DNA molecules of the cloned 
gene inside the host cell. 

As it is difficult to achieve high efficiency of conversion of mRNA molecules into full-length cDNA clones, especially 
30 when the mRNA of interest is relatively long, several refinements in cDNA cloning strategy have been made. Among 
them, the Okayama-Berg method significantly improved the efficiency of full-length cDNA cloning. 

The Okayama-Berg approach has several advantages over previous, conventional methods for cloning cDNAs. 
The following section is intended to highlight these advantages in relation to the main steps of this complicated method. 
For a more complete and detailed description of the method, see the original publication [Okayama, H. and Berg. R 
35 (1982) f^ol. Cell. Biol. 2, 16M70], which is hereby incorporated herein by reference. 

The main advantages of the Okayama-Berg method for cDNA clone relate to tiie fact that as part of the processing 
needed to form mRNAs, transcripts of eukaryotic genes undergo enzymatic addition of multiple aderwsine residues at 
the 3' end, thereby acquiring what is known as a ''poly(A) taiP. In tiie present context, tiie term mRNA encompasses 
any RNA species from any source, natural or synthetic, having a 3' poly(A) tail comprising two or more adenosine res- 
40 idues. 

In the original Okayama-Berg approach, synthesis of the first DNA strand from the mRNA template is initiated by 
annealing the 3' poly(A) of the eukaryotic mRNA to an oligo(dT) primer which forms an extension of one end of a DNA 
strand of the cloning plasmid. First strand cDNA synthesis by tiiis "plasmid-priming" method directs the orientation of 
the sequence within the cDNA into a unique relationship with the sequence in tiie plasmid; hence, this approach has 
45 been called "directional" cloning. Directional cloning ensures that every cDNA clone that is formed will be correctiy ori- 
ented for a promoter provided in tiie cloning plasmid (an SV40 promoter in tiie original Okayama-Berg system) to drive 
transcription of the proper cDNA strand to produce RNA with the conrect sense for translation into the protein encoded 
by the original mRNA template. 

To provide high efficiency of ligation in cloning DNA segments in general, restriction nucleases are utilized to pro- 
50 duce short single-stranded ends on the DNA that are complementary In base sequence to any other DNA end produced 
by the same enzyme. Accordingly, these single-stranded ends can anneal together by forming specific DNA base pairs, 
or, in tiie vernacular, they are "sticky". This annealing greatiy enhances tiie rate of joining DNA segments by enzymatic 
ligation and furtiier provides a means for selectively joining ends of segments treated with the same enzyme. 

In the original Okayama-Berg mettiod. after synthesis of the first cDNA strand, an oligo(dG) tail is attached enzy- 
55 matically to tiie free end of the plasmid-primed cDNA, and tiien tiie plasmid is cleaved by a restriction enzyme {HindlW) 
to produce a sticky end on tiie plasmid opposite to tiie end where the cDNA is attached. A short DNA fragment 
("linker"), which contains the SV40 promoter and has a cleaved Hind\\\ site n one end and olig (dC) n tiie ther, is 
then attached to the cDNA-plasmid molecule by ligation, to circularize the molecule. 

In other, more conventional methods a (synthetic) linker may also be used to clone cDNAs, but it is attached after 
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second strand DNA synthesis and further enzymatic repair which is necessary to form perfectly matched strands (i.e., 
a Hush" or "blunt" end). To protect internal restriction sites of the double-stranded cDNA from deavage by the restric- 
tion enzyme required to allow ligation of the vector and linker, prior to addition of the linker, the cDNA is methylated with 
the appropriate DNA modification system associated with the given restriction enzyme. However, such protection may 

5 not be absolute; thus, internal sites may be cleaved at some frequency due to an incomplete methylation reaction. In 
contrast, in the original Okayama-Berg method, this problem of internal cleavage of cDNAs is obviated by cleavage of 
Hind\\\ sites on the vector when the cDNA is represented as an RNA:DNA hybrid that resists restriction. 

The Okayama-Berg approach provides yet another advantage over previous methods in which both ends of sepa- 
rately synthesized cDNAs are ligated to the vector ends at the same time, namely that according to Okayama-Berg, the 

10 necessary circularization of the vector DNA with the cDNA attached at one end is relatively efficient via the linker 
because only one juncture between the cDNA and vector molecules remains to undergo ligation. 

Furthermore, the overall Okayama-Berg approach offers additional advantages over previous methods. Following 
circularization. a process called "RNA nick translation" using DNA polymerase I and RNase H is used which facilitates 
complete synthesis of the second strand along the entire first strand. This process overcomes the inherently low 

15 processivity of DNA polymerase I by using multiple sites for priming of second strand DNA synthesis with DNA primer 
fragments having random sequences. 

Finally, since the Okayama-Berg vector has already been joined to the cDNA when the second strand is synthe- 
sized, truncation of cDNA molecules close to the 3* end of the cDNA generally does not occur, in contrast to other meth- 
ods in which the second strand is completed while the 3* end of the first strand is free and, therefore, more susceptible 

20 to damage from nuclease activities. 

Cloning vectors based on bacteriophage X are also known. The second strand synthesis reaction of the Okayama- 
Berg method has also been utilized in a simpler cloning procedure [Gubler, U. and Hoffman. B. J. (1983) Gene 25, 253- 
269]. allowing cDNA cloning in such X vectors [Huynh. TV.. Young, R. A. and Davis, R.W. (1985) in DNA Cloning, A 
Practical Approach, ed. Glover. D. (IRL, Oxford), Vol. I. pp. 49-78]. This X-based cDNA cloning method has been widely 

25 used, mainly due to the high efficiency of transmission of recombinant DNA into cells by means of infectious phage par- 
ticles, which are produced with in vitro DNA "packaging" systems. X phage cloning systems also offer convenient clone 
screening capabilities due to tolerance of a high density of X plaques on test plates to be screened, conpared with most 
plasmid systems which permit only lower densities of bacterial host colonies. 

Early X systems for cDNA cloning, however, while retaining the second strand synthesis strategy of the original 

30 Okayama-Berg plasmid method, lack some of its otiier advantages. For example, directional cloning is not possible in 
those original X systems. In addition, multiple inserts and truncated cDNAs are frequently obtained. Further, despite the 
high packaging efficiency for native X DNA molecules, tiie packaging efficiency of recombinant DNA molecules that are 
produced by cleavage of intact linear X molecules and ligation with cDNA fragments is usually low compared to tiiat of 
intact X DNA. 

35 Recently, directional cloning capabilities have been introduced into various X vectors. For example, one such direc- 
tional X vector employs a site for insertion of DNA segments that comprises two different restriction enzyme cleavage 
sites [Meissner. R S., et al. (1987) Proc. Nat. Acad. Sci. USA. 54, 4171-4175]. The cDNA molecules are primed with 
oligo(dT), made double-stranded, and tiien methylated with tiie enzymes needed for protection against internal cleav- 
age by t)oth of tiie nucleases used in the DNA insertion site of tiie vector. A linker segment containing a cleavage site 

40 for only one of the nucleases of the insertion site is added to both ends of the cDNA. The combination of tfie last two 
A:T base pairs on tiie 3' end of the cDNA with the sequences at one end of the linker, however, creates a cut site for the 
other of the two nucleases of the insertion site. Thus, after restriction with both nucleases of the insertion site, tiie indi- 
vidual cDNA segments can ligate into the vector only in a single direction with respect to tiie two different cleavage sites 
in the vector. 

45 Various general disadvantages of this particular approach for cDNA cloning in X phage, conpared to the Okayama- 
Berg plasmid mettiod. have been described above in relation to other systems; and otiier problems specific to tiiis 
approach have been noted [Meissner, R S., et al. (1987). supra]. Nevertiieless, it was reported that one cDNA library 
constructed by tiiis method, starting from 5 tig of mRNA, contained about 2x10® clones with 8 of 10 having cDNA 
inserts (i.e., the reported cloning efficiency was about 3x10^ recombinants per \iq of poly(A)+ RNA). 

50 Directional cloning in otiier X phage vectors has also been reported [Palazzolo, M. J. and Meyerowitz, E. M. (1 987) 
Gene 52, 197-206]. [These vectors are known as XSWAJ or XGEM, certain variants of which (LambdaGEM™2 and 
LanibdaGEM™4) are commercially available from Promega Corporation of Madison, Wisconsin. The XGEM type of 
vectors are also examples of a conposite vector comprising both a X phage genome and an -embedded plasmid 
(GEM)]. The directional cloning scheme in these X vectors utilizes two different restriction enzyme cleavage sites at the 

55 site for insertion of DNA. Thus, for example, to attach tiie end of a cDNA corresponding to tiie poly(A) end of tfie mRNA 
to a particular end of the cleaved vector DNA that has a sticky end for the restriction enzyme Sad. a syntiietic DNA 
"linker-primer" segment is used which combines a singi -stranded ligo(cfT) prim r with a restriction site for the nzyme 
Sad After second strand synthesis, a linker segment with the site of a second resti'iction enzyme is ligated to th ther 
end of the cDNA, which is tiien restricted witii botii enzymes of tiie insertion site of the vector, according to much tiie 
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same strategy as described for the previous example of a directional X phage vector. 

This particular approach for directional cloning in a X vector, however, cannot be used to obtain full-length cDNAs 
of certain mRNAs because it requires cl avage of the cDNA molecules by the restriction enzyme Sad and a second 
enzyme (e.g., Xba\) without first protecting the internal sites for these enzymes by appropriate methylation. [In an alter- 

5 native version of the scheme reported by Palazzolo and Meyerowitz, supra, the Xba\ enzyme was replaced by EcoRI 
and the cDNA was methylated to protect against only this one enzyme.] Sites for these particular enzymes occur fre- 
quently by chance in natural nucleotide sequences. Thus, restriction of cDNAs with enzymes like these, as taught in this 
approach, causes truncation of cDNA inserts with internal Sac\ (and/or Xba\) sites. In relation to cloning efficiency, it 
may be noted that this publication described a single cDNA library constructed by this method, starting from 1 \iq of 

10 poly(A)^ RNA. that contained about 1 .6 x 1 0^ clones with cDN A inserts. 

In addition to the publications on directional cloning systems described above, there is a report which describes a 
non-directional plasmid-based system that uses an efficient oligonucleotide-based strategy to promote cDNA insertion 
into the vector [Aruffo, A. and Seed. B. (1987) Proc. Nat. Acad. Sd. USA 84, 8573-8577]. This method uses synthetic 
DNA adaptors that encode a recognition site for a particular restriction enzyme, SsfXl, which has a variable recognition 

15 sequence, as illustrated below: 

; 

5 • -CCANNNNNNTGG-3 ' 
3 • -GGTWArWW2WACC-5 ' 



where A, T G and C indicate nucleotides having the DNA bases adenine, thymine, guanine, and cytosine, respectively 
(for which the pairs A:T and G:C are complementary), and N and N represent bases that are included within the recog- 
nition site sequence but that can be any of the usual DNA bases, provided only, of course, that each N and the corre- 

25 spending N on the opposite DNA strand be complementary. The arrows {I and t) indicate the cleavage sites on the 
upper and lower DNA strands, respectively Accordingly cleavage of the BstX\ site creates a 4-base single-stranded 
extension (sticky end) on the 3' end that varies from site to site. 

The report above discloses a plasmid vector with a site for insertion of DNA segments in which two identical SsfXl 
sites were placed in inverted oriiBntation with respect to each other and were separated by a short replaceable segment 

30 of DNA. Inversion of a DNA sequence consists of representing the base sequence of each strand, conventionally 
expressed in the 5' to 3* direction of the polynucleotide backbone, in a DNA strand with the same base sequence pre- 
sented in the 3* to 5* direction (e.g, inversion of the DNA sequence 5'-ACTG-3' produces the DNA sequence 3*-ACTG- 
5' or, in the conventional 5' to 3' format, 5'<3TCA-3'. 

With the particular BstX\ recognition sequence that was employed in this vector, the 4-t>ase single-stranded ends 

35 of the inverted sites created on the two ends of the vector DNA by restriction with the BstX\ enzyme were not able to 
anneal with one another. This situation is illustrated below, where two identical sites, one inverted relative to the other 
and separated by an unspecified sequence (N,..N), are shown; the sticky ends of the vector produced by cleavage 
with the BstX\ enzyme are shown in bold print: 

5 ' - (vector) CCANTGTGNTGG (N,,JJ) CCANCACANTGG (vector) -3 
3 • - (vector) GGT27ACACWACC (N^N) GGTNGTOTNACC (vector) -5 
f. r 

45 (Note that the reference does not specify the entire BstX\ recognition sequence that was used; only the sequence of 
the sticky erxi is clearly defined, as indicated below by inclusion of the N symbol where necessary). 

Inspection of these single-stranded end sequences on this plasmid vector reveals that they are identical, due to the 
inversion of one of the sites relative to the other. Thus, the ends of the vector with inverted and non-inverted copies of 
this particular iBsfXI restriction site sequence cannot anneal with each other. Similarly, the restricted ends of the spacer 

50 DNA segment between these two sites will be identical. Accordingly to clone cDNA segments in this vector, a synthetic 
adaptor was attached to each end at the double-stranded stage, by blunt end ligation, giving them the same termini as 
the replaceable segment that was removed from the vector with BstXl The specific adaptor used in the above report 
comprises the following oligonucleotide sequences: 

55 5«-CTTTAGAGCACA-3« 

3 • -GAAarcrc-5 • . 



Obviously, addition of this single adaptor to both ends of the cDNA segments would provide those segments with ends 
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(In tx)ld type) that could anneal and subsequently ligate efficiently to both identical vector ends. 

Thus, Aruffo and Seed, 1987, supra, discloses a method using this particular B$tX\ recognition site sequence, 
whereby neither the cDNA (with attached adaptors) nor the Isolated vector DNA (after being freed from the replaceable 
segment after cleavage with SsfXI) was able to iigate to itself. This work, however neither teaches nor suggests general 
5 requirements for a BstX\ recognition sequence, or for those of other restriction enzymes, to be usable in this cloning 
approach. 

Further, as these workers pointed out, their strategy did not provide a directional cloning capability. After first alleg- 
ing that such directional capability was not needed, they admitted that, nonetheless, they had devoted considerable 
unsuccessful efforts to developing an alternative means of producing mRNA from every cDNA clone, namely a bidirec- 

10 tional transcription capability whereby both strands of an inserted cDNA would be transcribed. They concluded that this 
goal cannot be easily attained, at least not in their cloning host system. The authors stated, moreover, that they could 
obtain cloning efficiencies with their plasmid that were between 0.5 and 2 x 10® recombinants per fig of mRNA, which 
were said to compare favorably with those described for certain cloning systems based on phage X. In the only example 
of a cDNA library described in this reference, however, the yield of cDNA clones obtained by this method was actually 

15 stated to be only « 3 x 10^ recombinants from 0.8 M^g poly(A)-containing RNA (i.e., less than 0.4 x 10® recombinants per 
\iQ poly(A)-containing RNA). 

Thus, there has been a continuing need for methods and vectors which would provide a higher yield of cDNA 
clones from limited amounts of eukaryotic mRNAs while also providing an improved means of directing orientation of 
inserted cDNA fragments within vector DNAs. 

20 

SUMMARY OF THE INVENTION 

The present invention contenplates the application of methods of recombinant DNA technology to fulfill the above 
needs for increased efficiencies in DNA cloning systems and, in particular, to develop new means for directional inser- 
ts tion of cDNA fragments into cloning vectors. 

More specifically, it is an object of the present invention to provide means for directing assembly of insert DNAs into 
vector DNAs to form a unique, predetermined recombinant structure having the desired number and orientation of each 
needed DNA fragment, so that the number of resulting clones containing single inserts, as well as the probability of 
obtaining a full-length clone from each mRNA molecule, are enhanced. 
30 Further, it is an object of this invention to provide a cDNA cloning system which combines the features of this highly 
efficient cloning strategy with advantageous features of X phage vectors to overcome [imitations of the presentiy avail- 
able X cloning systems. 

Accordingly, the present invention relates to highly efficient means for inserting DNA segments into cloning vectors 
in a defined orientation, and a method for using such means that is referred to herein as the "automatic directional clon- 
35 ing (ADC)" method. Novel DNA vectors and DNA segments are also included. 

Appreciation of tiie operation and advantages of this invention requires further analysis of tiie problems in the prior 
approaches. The understanding of these problems by tiie present inventors lead to development of this invention. 

The present invention has been developed in light of recognition by these inventors of major sources of the limita- 
tions on cloning efficiency with the present systems designed for directional cloning, that they all employ restriction 
40 enzymes with recognition site sequences having one or both of the following disadvantages: they are either too short 
or they have a particular type of symmetry called "dyad" symmetry 

As noted above, present X phage vectors for directional cloning of cDNAs suffer inefficiencies due in part to their 
use of restriction enzymes with recognition sequences that occur frequentiy in natural DNA sequences. Some problems 
relating to this issue might be solved by choosing a restriction enzyme with an infrequentiy occurring site (i.e., a longer 
45 recognition sequence which, by chance, would occur less frequentiy in random natural DNA sequences). 

However, even when modified to utilize an infrequentiy cutting resti^iction enzyme, the present implementations of 
directional cloning in X have a drawback that is common to any cloning scheme using restriction enzymes with recc^- 
nition sequences having dyad symmetry of the sticky ends produced by cleavage witii the enzyme. Typical restriction 
enzymes with recognition site sequences having dyad symmetry make staggered cuts in the two opposing DNA strands 
so at symmeb-ical points surrounding tiie center of a dyad pattern. Cleavage by this type of enzyme produces short single- 
stranded ends which are complementary in base sequence to tiiose of any other DNA fragment produced by cleavage 
witii the same enzyme. 

For example, tiie recognition site of the commonly used restriction enzyme. EcoRI. consists of the following com- 
plementary sequences which when cleaved by the enzyme, produce the 4-base extension of the 5' end of the DNA con- 
55 taining the dyad TTAA" (shown In bold face type): 
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1^ 

5I-GAATTC-3' 
3'-CrrJUIG-5* 
t 



where A, T G and C indicate DNA bases, as described above for the BstX\ site. Inspection of this EcoRI sticky end 
sequence readily reveals that inversion of this sequence produces its complement, namely "AATT". Thus, any DNA end 

/o produced by EcoRI can anneal to any other such DNA end; and, therefore, any EcoRI sticky end can also be ligated 
efficiently to any other such end. Similarly, all DNA ends which are produced by any one restriction enzyme that gen- 
erates sticky ends characterized by dyad symmetry are in the case of each sticky end sequence readily ligatable one 
to another. [Hereinafter, such "self-ligatable" single-stranded ends of DNA that are producible by a restriction enzyme 
will be simply refen-ed to as "symmetrical ends", and the enzymes that produce them, as "symmetrical restriction 

75 enzymes".] 

In light of this symmetrical nature of many restriction enzyme recognition sites, one of the major problems with 
existing directional X vectors can be more fully appreciated. When the two end fragments of cleaved X DNA (i.e., the so- 
called X "arms") are ligated with cDNA fragments, several products are produced, only some of which constitute the 
desired infectious DNA molecules containing cDNA inserts. For instance, consider the simplest case, when the ends 

20 on both the cDNA and on each of the "left" and "right" X arms (as the two X DNA arms have been designated in a 
genetic mapping convention) have been cut by the same symmetrical restriction enzyme. Here, linear structures other 
than those with the desired order (I.e., "left arm-cDNA insert-right arm") may form in significant amounts during ligation 
with cDNA fragments; and the cDNAs trapped in these other, nonviable structures cannot produce phage clones. These 
undesirable ligation byproducts may include self-ligation products of the two ends of individual vector or cDNA seg- 

25 ments, consisting of drcular DNAs. Ligation products in this instance may also comprise vector-cDNA combinations 
containing multiple inserts, which, even If viable, may create problems in expression or Identification of original mRNA 
structure. 

On the other hand, when each end of the vector and insert cDNA molecule are ultimately produced by two different 
symmetrical restriction enzymes, as in the present directional X systems, these ends are then physically distinguishable 

30 in relation to the polarity of the encoded genetic information in each DNA segment, i.e., the matching of complementary 
sticky ends on vector DNAs and cDNAs results in the desired directional cloning of the cDNA insert relative to functional 
sequences in the vector (e.g., a promoter). Further, circularlzation due to self-ligation of cDNAs or vectors without 
inserts is eliminated by the use of two different symmetrical restriction enzymes. 

Other undesirable ligation byproducts remain, however, in the usual two enzyme approach for directional cloning 

35 using symmetrical enzymes. Some of these are dimers of vector or cDNAs, which may be designated, for example, as 
"tail-to-tair or "head-to-head" dimers. Thus, even when vector and cDNAs are made by cutting with two different sym- 
metrical enzymes, head-to-head and tail-to-tail dimers are not eliminated, although the population of desired molecules 
is significantly higher. 

In contrast to existing systems based on X phage, the automatic directional cloning method does not permit cDNA 
40 or vector fragments to ligate to each other, ensuring tiie presence of a single insert in each clone, as well as higher clon- 
ing efficiencies and lower backgrounds of clones that do not contain cDNA inserts. 

To accomplish these goals, the present invention contemplates use of restriction enzymes which produce single- 
stranded ends that do not exhibit dyad symmetry (hereinafter referred to as "non-symmetrical ertds" and correspond- 
ingly, non-symmetrical recognition site sequences and nzymes). Although certain preferred embodiments of the 
45 present invention employ derivatives of bacteriophage X as the vector, which further comprise embedded plasmid 
genomes, this invention can be practiced witii any self-replicating DNA molecule (i.e., a "replicon") serving as tiie vector 
for DNA cloning in any host in which the selected replicon can be replicated. 

Work cited in tiie Background above describes a plasmid-based system that advantageously employs two identical 
BstX\ recognition site sequences, albeit in two different orientations. This single recognition sequence is non-symmet- 
50 rical according to tiie definition in the present disclosure, although tiie reference does not describe the SsfXI sequence 
in such terms or othenwise characterize this sequence as such. The present invention is clearly distinguishable from tills 
previous approach, as described below. 

Use of SsfXI sites is not readily applicable to the X system, due to the existence of multiple BstX\ recognition sites 
in the X phage genome, owing to the number of base pairs in the variable recognition sequence that are not allowed to 
55 vary (i.e., the "invariable base pairs" being only six). 

Accordingly, in one aspect the present invention relates to a genetic cloning vector comprising at least one replicon, 
and a site for inserting DNA segments to be cloned that includes at least two non-symmetrical restriction enzyme rec- 
ognition sequences that are identical, where the first of these identical recognition sequences is in the inverted orienta- 
tion with respect to a second identical sequence; and, in addition, tiie identical restriction enzyme recognition 
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sequences include greater than six positions having invariable DNA base pairs. RecognKion site sequences of the 
enzyme Sfil for example, fulfill both the length and asymmetry requirements for this aspect of the invention, as will 
become evident below. 

On the other hand, in plasmid systems or other replicons lacking BstXl sites, either naturally or du to genetic engi- 
5 neering, two BstXl site sequences that are in the same instance non-symmetrical and nonidentical can be advanta- 
geously employed for efficient cloning of ONA segments according to the present invention. 

More generally, this aspect of this irrvention may be practiced with any two non-symmetrical restriction enzyme rec- 
ognition sequences that are not identical (recognitions site sequences of the enzyme Sf/I, for instance). 

Accordingly, the present invention also relates to a genetic cloning vector comprising at least one replicon, and a 
JO site for Inserting DNA segments to be cloned that includes at least two non-symmetrical restriction enzyme recognition 
sequences that are nonidentical. In this aspect of the invention, two of the non-symmetrical restriction enzyme recog- 
nrtion sequences can be selected advantageously to be cleavable by a single restriction enzyme, for example. BstXl or, 
alternatively, Sfi\] or each of two nonidentical restriction enzyme recognition site sequences may be selected to be 
cleavable by a different enzyme. Preferably, at least one of the non-symmetrical recognition sequences includes greater 
15 than six positions having invariable DNA base pairs; and most preferably, two nonidentical recognition sequences 
include greater than six positions having invariable DNA base pairs, as typified by Sfi\ recognition site sequences. 

The present invention further relates to a vector, as described above, in which the replicon comprises a form of bac- 
teriophage X. 

The vector may advantageously further conprise regulatory elements located in relation to the site for insertion of 
20 DNA segments such that, when a DNA segment is inserted into this site, at least a portion of the sequences of the DNA 
segment is transcribed. This portion may be derived from either one of the strands of the inserted double-stranded DNA 
segment, or from both of these strands. 

In one major embodiment of this aspect of this invention, these regulatory elements in the vector consist of promot- 
ers that entirely originate from bacteriophage. By the phrase "originate from" it is meant that the regulatory element 
25 (e.g., promoter) is encoded in the genome of the instant organism or virus (e.g.. bacteriophage) as it occurs in nature. 
It should be noted here that it is well known that, generally, promoters that originate from bacteriophage are not able to 
initiate transcription in eukaryotic hosts. This particular embodiment of the present invention is exemplified by two X- 
plasmid composite vectors, LambdaGEM™1 1 and LambdaGEM™12, which are commercially available from Promega 
Corporation of Madison. Wisconsin. 
30 According to available information at the time of the present disclosure, these particular LambdaGEM^ vectors 
apparently were first disclosed in the 1988/1989 Catalogue and Applications Guide for Biological Research Products 
published and distributed by Promega Corporation in August of 1988, the entirety of which is hereby incorporated 
herein by reference. TTie following excerpts of that catalog describe these particular vectors and some of their various 
uses, particularly those relating to transcription from bacteriophage promoters. 

35 

Section 11. pace 5 : 

The LambdaGEM-1 1 vector is a multi-functional genomic cloning vector designed for high resolution mapping of 
recombinant inserts, simplified genomic library construction, ultra-low background of non-recombinants, and rapid 

40 genomic walking. This lambda replacement-type cloning vehicle contains the following features (Figure 2 [not shown]): 
dual opposed bacteriophage T7 and SP6 RNA polymerase promoters, flanking asymmetric Sfi\ restriction sites, and a 
multiple cloning site with strategically positioned Xho\ and BamH\ restriction sites. The LambdaGEM-1 1 vector also 
contains unique sites for Sad, Avrll, EcoRI. and Xbal Because it is a derivative of EMBL3 (1), DNA fragments ranging 
from 9-23kb can be cloned in the LambdaGEM-1 1 vector and the Spi phenotypic selection against non-recombinants 

45 is available. The vector was designed to make use of the Sf/I recognition sites flanking the pronK)ters for the high res- 
olution restriction mapping of insert DNA using the Sfi linker mapping system (Sec. 11 . pg. 8). 

The T7/SP6 phage promoters simplify chromosomal "walking", as RNA probes synthesized from the extremities of 
the cloned insert can be used to search a library for overlapping sequences in either direction. In addition, tiie nucle- 
otide sequence of tiie end of an insert cloned in the LambdaGEM-1 1 vecta can be obtained directly form the phage 

50 template by hybridizing an SP6 or T7 oligonucleotide primer, followed by a chain termination sequencing reaction (2.3). 
Two cloning strategies for genomic library construction, using DNA partially digested with Mbo\ or Sau3Al are 
available witii the LambdaGEM-1 1 dephosphorylated BamHl arms. A new cloning strategy (4) relies on the exclusive 
specificity with which partially f illed-in Xhol LambdaGEM-1 1 arms [Xho\ half-site arms) can be combined wttii partially 
filled-in Sau3A\ digested genomic DNA. The only ligation products possible are single copies of genomic inserts with 

55 appropriate arms, since the partial fill-in prevents self-ligation reactions of vector arms, central stuffer, and genomic 
fragments. This method also makes genomic DNA fractionation unnecessary, is very rapid (Figure 3 [not shown]), and 
requires small amounts of starting material. The Xhol and BamHl sites in the LambdaGEM-1 1 vector are strategically 
positioned 6 and 1 1 bases, respectively, from the transcription initiation site of eitiier promoter. 

As measured by in vitro packaging, recombinant efficiencies of 3 x 10^ pfu^ig DNA have been achieved using a 
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test insert ligated to LambdaGEM-1 1 dephosphorylated Ba/nHl or Xho\ half-srte arms using Packagene® lambda 
packaging extracts. The background for selMigated arms alone is typically <100 pfu/^ig DNA in either case. This ultra- 
low background level of non-recombinant vector DNA has three important advantages: It eliminates the need for the Spi 
genetic selection against the parental vector, which is known to result in biased libraries (5), non-productive ligation 
events are minimal, thereby resulting in larger genomic libraries, and fewer filters need be processed for screening a 
library. For detailed protocols describing the use of this vector, see Sec. 11 . pg. 12. 

ggrtpn ll.paqgg: 

The LambdaGEM-1 2 vector is a multi-functional genomic cloning vector designed for high resolution restriction 
mapping of recombinant inserts, simplified genomic library construction, ultra-low background of non-recombinants, 
and rapid genomic walking. This lambda replacement-type cloning vehicle contains the following features (Figure 4 [not 
shown]): dual opposed bacteriophage T7 and SP6 RNA polymerase promoters. RNA polymerase promoters [sic], 
flanking asymmetric Sfi\ restriction sites, and a multiple cloning site with strategically positioned Not\ and BamH\ 
restriction sites. The LambdaGEM-12 vector also contains unique sites for Sac\, EcoRI. X/7ol. and Xbal Because it is 
a derivative of EMBL3{1 ), DNA fragments ranging from 9-23kb can be cloned in the LambdaGEM-1 2 vector and the Spi 
phenotypic selection against non-recombinants is available. 

Accordingly, the present invention relates to a vector that is either LambdaGEM 1 1 or LambdaGEM 12. Further 
details of the use of these vectors for restriction mapping of inserted DNA segments, according to the Sfi Linker Map- 
ping System mentioned above, are extracted from the Promega catalog below. 

Section 11. page 8: 

The multi-functional LambdaGEM -11 and LambdaGEM -12 genomic cloning vectors have been engineered spe- 
cifically for high resolution restriction mapping of recombinant inserts. The vectors, derivatives of EMBL3. possess Sfi\ 
restriction sites flanking bacteriophage T7/SP6 RNA polymerase promoters and a multiple cloning region (Sec. 1 1 , pg. 
5). The flanking Sfi\ restriction sites allow most inserts to be excised as a single fragment, since this 8-base recognition 
sequence occurs infrequently in genomic DNA (in theory, once every 65,536bp). 

Sfi\ recognizes the interrupted palindrome GGCCNNNN/NGGCC and cleaves within the central unspecified 
sequence, leaving a 3-base 3' overhang. The nucleotide sequence of the central region which becomes the overhang- 
ing termini thus may contain any of the four possible bases. The flanking Sfi\ sites in the vectors have been designed 
in an asymmetric fashion, so that the site on the left is distinct from the site on the right. Therefore, radiolabeled linkers 
complementary to either the left or right Sfi\ termini can be ligated separately to the Sfil excised genomic DNA. Once 
the insert has been asymmetrically labeled, a high resolution restriction map can be determined by partial digestion 
with a frequent cutting restriction endonuclease such as Sau3A\ followed by gel electrophoresis and autoradiography 
(Figure 6 [not shown]). 

The mapping resolution of this method is an order of magnitude greater than conventional cos site oligo labeling, 
since only the ends of the centrally located insert are labeled instead of the ends of the 20kb and 9kb arms of the vector. 
The variable results generated from inaccurate size estimates of large restriction size fragments, as well as anomalous 
bands which result from the fusion of the insert with a vector fragment, are eliminated with this system. For a detailed 
protocol describing the use of this system, see Sec. 1 1, pg. 14. 

Still further, the present invention relates to a vector having nonidentical non-symmetrical restriction enzyme rec- 
ognition site sequences, as described above, also including regulatory elements located such that the sequences of an 
inserted DNA segment are transcribed, as in the LambdaGEM^M vectors above, but where the regulatory elements are 
at least partly of eukaryotic origin. A principal embodiment of this aspect of the present invention is exemplified by two 
X-plasmid composite vectors, XpCEVIS and ^CEV9, the structures of which are depicted in Figure 1 and described 
further below. 

In cloning operations with tiiese two vectors, a DNA segment, a cDNA, for example, is cloned between two Sfil 
sites, A and B. as described in the section below relating to the automatic directional doning method. The vectors are 
designed as eukaryotic expression vectors, utilizing the M-MLV LTR promoter, and they contain the SV40 early pro- 
moter-driven neo gene as a selectable marker. 

Thus, the present invention further relates to a genetic cloning vector comprising at least one replicon. and a site 
for inserting DNA segments to be cloned that includes at least two nonidentical restriction enzyme recognition 
sequences that are non-symmetrical, where the vector also includes a selectable nrwrker that is functional in eukaryotic 
cells in which the vector can be replicated. The term "functional" as used here means that the gene for tiie marker is 
expressed and that a selection scheme for tiiat marker is operable in these eukaryotic cells. 

In these two particular exemplary vectors, tiie form of X-plasmid composite vector was chosen to take advantage 
of the efficient packaging and high density screening in X systems, and simpler DNA preparation and analysis in plas- 
mid systems. After isolation of clones of interest. pCEV plasmids with cDNA inserts can be obtained by Not\ digestion 
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of crude X DNA preparations arxf ligation followed by transformation of bacterial cells. The X genotype which supports 
healthy growth (rec^, gam*) was chosen to maintain the intactness of inserts during the amplification step of the librar- 
ies. Deletion and/or insertion derivatives generated during the amplification step, if any. would not accumulate in the 
population, since they would not have a growth advantage. 

XpCEV15 has several advantages over XpCEV9 as follows: (1) XpCEVIS does not require the supF mutation in 
host cell due to the S* allele in the X genome. (2) XpCEV15 does not lysogenize host strains due to the deletion of the 
c/ gene. (3) cDNA inserts in XpCEVIS can be cut out by Sail digestion. (4) ApCEV9 loses the functional SV40 promoter 
after the cDNA insert is cloned, while XpCEVIS does not. (5) XpCEV15 can accommodate longer cDNA inserts (up to 
10.5 kb) than ApCEV9 (up to 8.5 kb). (6) >pCEV15 DNA contains a unique HindWl site. 

XpCEV9 has at least two advantages over ApCEVI 5. The XpCEV9 genome has a stuffer fragment between the two 
Sfi\ sites, which would be replaced by cDNA inserts during cloning process. Generally, it has been found that ApCEV9 
cDNA libraries have lower backgrounds than XpCEV15 libraries, presumably because the presence of the stuffer frag- 
ment in XpCEV9 separates the two Sfi\ sites enough to ensure complete Sfi\ cleavage. It has also been observed that 
ApCEV9 grows more stably without accumulation of fast-growing derivatives. This is likely due to the longer size of the 
genome compared to that of ApCEV15. 

In another aspect, the present invention relates to a method for cloning of DNA segments referred to as the auto- 
matic directional cloning method. In particular, the present invention relates to a method for cloning a cDNA copy of a 
eukaryotic mRNA, comprising the following steps (which are further illustrated in Figure 3 and in the Description of Spe- 
cific Embodiments, below): 

(i) annealing a linker-primer DNA segment comprising a single-stranded oligonucleotide which has oligo(dT) at the 
3' end, and a single-stranded extension at the 5' end that is included in a first non-symmetrical restriction enzyme 
recognition sequence. [Note that this first recognition sequence is identical to one of two non-symmetrical sites in 
the vector that are used for direction cDNA cloning.] 

(ii) enzymatically synthesizing the first strand of the cDNA from the linker-primer that is annealed with the mRNA 
molecule; [Typically, this may be accomplished using a reverse transcriptase. During the first strand synthesis reac- 
tions, the single-stranded linker-primer is repaired so as to be double-stranded. Thus, the single-stranded exten- 
sion referred to in this method may be present as such in the linker-primer, or it may be produced from a double- 
stranded region of linker-primer by cleavage with a restriction enzyme following ligation of the linker-primer to the 
cDNA.] 

(iii) enzymatically synthesizing the second strand of the cDNA using the first strand as the template under condi- 
tions such that single-stranded extensions on the synthesized cDNA molecule are made double-stranded; [Typi- 
cally, the second strand is synthesized by DNA polymerase I from the nicks on the RNA moiety introduced by 
RNase H associated with the reverse transcriptase. The linker-primer is converted to the double-stranded form in 
the first or second strand synthesis step. T4 DNA polymerase treatment makes double-stranded any single- 
stranded extensions remaining on the synthesized cDNA molecule.] 

(iv) ligating onto the blunt-ended cDNA resulting from synthesizing the second strand, an adaptor DNA segment 
comprising a second non-symmetrical restriction enzyme recognition sequence that is nonidentical to the first non- 
symmetrical restriction enzyme recognition sequence; [In the case of a principal embodiment of this aspect of the 
invention, ligation of the adaptor ligation directly adds one single-stranded extension to the cDNA molecule. Alter- 
natively, however, this extension could be exposed by cleavage of the recognition site on a double-stranded portion 
of tiie adaptor after ligation to the cDNA.] 

(v) exposing the cDNA resulting from ligation witii the adaptor to one or more restriction enzymes that can cleave 
the first and second non-symmeti'ical resti'iction enzyme recognition sequences under conditions such that both of 
these sequences are cleaved, resulting in the vector DNA having two single-sti'anded ends that are not comple- 
mentary; 

fThis restriction causes exposure of at least one of the single-stranded extensions needed on the cDNA by cleav- 
age of tiie recognition site on the repaired linker-primer portion of the cDNA molecule. If the non-symmetrical site 
in the adaptor is also uncleaved at this point, it may also be resti-icted at this step. In a principal embodiment, a sin- 
gle enzyme can cleave the non-symmeti'ical sites on botii the linker-primer and the adaptor; but in otiier embodi- 
ments, two different enzymes may be required.] 

(vi) ligating tiie cDNA resulting from cleavage with tiie enzymes to DNA of a genetic cloning vector, where the vec- 
tor comprises 

at least one replicon; and 

a site for inserting DNA segments to be cloned that includes at least two non-symmeti'ical resti'iction enzyme 
recognition sequences. 

and where in the vector DNA, at least two non-symmeti'ical resti'iction enzyme recognition sequences have 
been cleaved by one or more enzymes that can cleave those recognition sequences, resulting in vector DNA 



9 



EP0 773 294 A2 

having two single-stranded ends that are not complementary; wherein further. 

[Thus, the two ends of the cleaved vector DNA cannot anneal and be ligated together. Cleavage of the vector 
DNA at both non-symmetrical sites usually releases a short DNA segment from betwe n the, the "stuffer"; for 
the highest yield of clones containing cDNA inserts, this stuffer is removed from the cleaved vector DNA prior 
5 to ligation of the vector with cDNAJ 

one of the single-stranded ends on the cleaved vector DNA has a sequence that is complementary to the sin- 
gle-stranded extension on the linker-primer attached to the cDNA; and 

the other single-stranded end on the cleaved vector DNA has a sequence that is complementary to the single- 
stranded extension on the adaptor attached to the cDNA; and 
10 [Thus the cDNA cannot circularize and is attached to the vector in a specific direction.] 

(vii) transforming a suitable host cell with the recombinant DNA segment comprising the cDNA and the vector DNA 
that results from the ligation of cDNA to vector DNA; and 

[Various genetic transformation methods known in the art may be used. In a principal embodiment, the vector is a 
15 form of bacteriophage X and, therefore, the recombinant DNA containing cDNA inserts Is packaged in vitro into 
phage particles which are then used to infect a bacterial host cell. Alternatively, for example, CaCl2 precipitation of 
DNA may be used to transform host cells, especially mammalian cells.] 

(viii) identifying a clone of host cells, resulting from transformation with said recombinant DNA, that contains a 
recombinant DNA segment including said cDNA. 

20 [Various strategies well known in the art of genetic engineering may be used to identify a clone of the desired 
cDNA, including hybridization with nucleic acid probes, immunological detection of expressed antigens, and assays 
for functional products, to name but a few.] 

The strategy underlying this cDNA cloning method of the present invention is based on the following theory. 

25 explained in terms of particular examples of a principal embodiment, which is presented to aid in understanding the 
method and does not in any way limit the scope of the invention as defined by the appended claims. 

When vector and insert DNA fragments are mixed and ligated in a typical cloning experiment, several molecules 
are produced in addition to those desired. These include self-ligation products of vectors or inserts, head-to-tail or head- 
to-head dimers of vector or insert, and vector DNA containing multiple inserts. Formation of these molecules would 

30 reduce the cloning efficiency. Even when vector and insert DNAs are made by cutting with two different enzymes, for- 
mation of ligation products such as head-to-head dimers can not be eliminated, although the population of desired mol- 
ecules is significantiy higher and insertion occurs in a defined orientation. 

The reason why these self-ligation products and dimers are made, as noted above, is that majority of restriction 
enzymes in common usage recognize sequences of dyad symmetry. The two sticky ends (S* and S') created with an 

35 enzyme contain the same single-stranded extensions, arxJ all combinations of the ends including S* and S* can be 
ligated. However, certain resti'iction enzymes cleave the non-symmetrical site (A), yielding two different sticky ends (A* 
and A'). In tiiis case, only A+ and A' ends can be ligated (see Fig. 2). When a vector DNA containing two different sites 
(A and B) witii this feature is cleaved by restriction enzymes of this kind, the stuffer fragment hemmed by tiie sites is 
removed, and ligation is performed with inserts having sticky ends complementary to those of the vector, theoretically 

40 all of tiie clones obtained contain single inserts in the defined orientation. 

In a principal embodiment of this aspect of the invention, the restriction enzyme Sfi\ was chosen to cleave both tiie 
A and B sites, because Sfi\ is an infrequent cutter and leaves a non-symmetrical 3* extension of three nucleotides (Fig. 
2A). Since the central 5 bases in the recognition site can be any sequence, two Sfi\ sites (A and B) were designed and 
introduced into tiie vectors (Figs. 1 and 2B). The cDNA fragments to be inserted into the vectors were oriented by tiie 

45 use of oligo(dT) primers having attached ttie sequence of tiie S//I(B) site. 

The steps for cDNA synthesis are schematically shown in Fig. 3. During tiie first strand synthesis reactions, tiie sin- 
gle-stranded linker-primer is repaired so as to be double-stranded. After cDNA molecules are Wunt-ended, an adaptor, 
designated Sfi\ adaptor, having the 3' extension which fits to tiie Sfi\ (A*) end, is ligated. After cleavage by S//I, tiie 
resulting cDNA molecules have different 3' extensions which fit on tiie vector ends to achieve directional cloning (Fig, 

50 2C). 

Thus, regardless of the sequences of the three-tjase single-stranded sti'cky ends, inversion of one Sfi\ end 
sequence can never produce a self-complementary sequence. Accordingly, regardless of tiie sequence of tiie five arbi- 
trary internal base pairs within any Sfi\ cleavage site, tiie polarity of the complementarity of the sticky ends will always 
be maintained. Thus, such inherently non-symmetrical sticky ends, as well as non-symmetrical variants of recognition 
55 sequences for which some forms can have dyad symmetry, as described above (e.g., 6s/XQ, are also useful for prac- 
ticing tiie automatic directional cloning method of the present invention, for efficient ligation of DNA fragments in a pre- 
directed order. Screenings of cDNA libraries c nstructed by this method, as desaibed below, demonstrated that cDNAs 
of up to 6.4 kilobase pairs containing complete coding sequ nces could b isolated at high efficiency. Thus, this cloning 
system is particularly useful for tiie isolation of cDNAs of relatively long transcripts present even at low abundance in 
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cells. 

The present invention further relates to DNA of a genetic cloning vector comprising at least one replicon; and a site 
for inserting DNA segments to be cloned that includes at least two nonidentical restriction enzyme recognition 
sequences that are non-symmetrical, in which the non-symmetrical restriction enzym recognition sequences have 
5 been cleaved by one or more enzymes that can cleave them, so that the DNA is ready for use in cloning DNA segments 
having matching sticky ends. 

In a specific embodiment, exemplified below, the present invention relates to a cloning vector suitable for use in 
cloning DNA segments for cDNAs, by means of stable phenotypic changes induced by a specific DNA segments. An 
example of such vectors is XpCEV27 which differs (described above] as follows: 

10 

(1) The M-MLV LTR fragment is replaced by one derived from pZIPneoSV(X); one skilled in the art will appreciate, 
however, that similar fragments can be used. 

(2) The bona fide promoter of the neo gene is removed to fuse the SV40 promoter directly to the neo structural 
gene. This modification eliminates ATG codens upstream from the translation initiation site of the neo gene, thereby 

15 increasing expression of the neo gene in mammalian cells. One skilled In the art will appreciate that the neo gene 
can still be expressed from the trp-lac fused promoter in bacteria (i.e., E. coli). 

(3) The second selectable marker in bacterial cells, the ampicillin resistance gene (amp), is introduced to permit 
select transformed bacterial (i.e., E. coli) cells resistant to both ampicillin and kanamycin, thus avoiding selection of 
truncated plasmid clones. One skilled in the art will appreciate that alternative markers can be used. 

20 (4) The sites for two additional infrequent cutters, Xhol and Mlul. were included along with the NotI site. Alternative 
infrequent cutters can also be used to effect the purpose of efficiently effecting plasmid rescue. 

(5) The multiple cloning site (MCS) contains the restriction sites for BamHI, Sail, Sfil(A), EcoRI, Bgfl. Hindlll, Sfi(B), 
Sail, and BstEII, in more convenient order. 

(6) The SP6-P and T7-P phage promoters were introduced to synthesize sense and anti-sense RNa of cDNA 
25 inserts, respectively A Alternative promoters can also be used. 

(7) A phage origin was introduced to synthesize single-stranded DNA from the vector, in the case of XCEV27, f I is 
used. 

(8) The rat preproinsulin gene polyadenylation signal is added for efficient expression of DNA inserts (alternative 
signals can be used to effect the desired end result). 

30 (9) The replication origin of pUCI 9 is used to increase the copy number of pCEV27, The replication origin of pCEV9 
and pCEV1 5 is derived from a short fragment of pBR322; since the ori sequence lacks the promoter for replication 
initiation, unstable replication results and thus lower copy number. Replication origins similar to pUC1 9 can also be 
used to increase copy number. 

35 Finally, the present invention also relates to a reagent kit comprising cleaved vector DNA, ready for use in cloning, 
as described above, and further including a linker-primer having a single-stranded end that is complementary to one 
single-stranded end of the cleaved vector DNA; and an adaptor which after cleavage by a suitable restriction enzyme, 
has a single-stranded end that is complementary to the other single-stranded erxl of the cleaved vector DNA. One 
skilled in the art of genetic engineering would appreciate that such a kit might advantageously also include appropriate 

40 quantities of enzymes, buffers and other reagents needed for the practice of the automatic directional cloning method 
according to the teachings of the present invention. 

The present invention may be understood more readily by reference to the following detailed description of specif ic 
emt>odiments and the Examples and Figures included therein. 

45 BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1. Structures of the vectors. Panel (A), XpCEV9 and panel (B), XpCEV15. Each vector contains a plasmid DNA 
within the X DNA. An expanded map of the plasmid portion is shown with derivation of the DNA segments and the 
location of several restriction sites including tiie multiple cloning site (MCS). Arrows show the locations of the pro- 
50 meters and the direction of transcription. 

Fig. 2. Scheme of the automatic directional cloning system. Panel (A), Nucleotide sequences of the Sfi\ sites. The 
general sti'ucture of the Sfi\ site is shown at the top. where the letter N denotes any nucleotide. The two Sfi\ sites 
specific to the vectors, S//I(A) and S//I(B), are shown under tiie general structure. The upper strands are shown in 
55 the 5' to 3' direction, while tiie lower ones are in tiie opposite direction. The bottom of tiie figure shows tiie 
sequences of tiie ends produced by the Sfi\ cleavage of tiie general Sfi\ site. The left and right half sites are 
denoted as $fi\C) and Sfi\('), respectively Similarly, tiie sequences of Sffl(A*), Sfi\(A'), S//I(B+), and Sfi\{B-) half 
sites can be derived from the sequences of the Sf/I(A) and S//I(B) sites. Panel (B), preparation of tiie XpCEV vector 
arms. XpCEV vector DNA is shown at the top, where co$L and cosR represent the left and right cohesive ends of 
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X. respectively. The locations of the Sf/I(A) and Sf/I(B) sites are shown as (A) and (B). respectively. Following liga- 
tion to seal the cohesive ends (In the middle), the DNA is cleaved by Sf/I to expose the Sf/I(A+) and Sf/I(B-) sticky 
ends of the vector molecules. The small stuffer fragment is removed to pr pare the vector arms shown at the bot- 
tom. The sequences of the single-stranded extensions are shown at both the 3'-ends of the cDNA and vector arms. 
Panel (C), formation of the X concatemers containing cDNA inserts. The cDNA fragment shown at the left side are 
prepared to give the Sfi\{K) and S//I(B*) sticky ends to the molecule by the procedure described In Fig. 3. When 
the fragments are ligated with the prepared vector arms, alternating concatemers consisting of the cDNA inserts 
and vector arms in the defined orientation are produced automatically due to the base-pairing specificity as shown 
in the middle. In vitro packaging extracts cut out the DNA segments hemmed by the two cos sites from the con- 
catemer to form the active X phage particles as shown at the bottom. 

Fig. 3. Schematic view of the cDNA synthesis. An mRNA molecule is shown at the top with the cap structure 
(m^Gppp) and the poly(A) stretch (AAAAA) at the 5'- and 3'- ends, respectively. The linker-primer Is the single- 
stranded oligonucleotide which contains the oligo(dT) at the 3* half, and the Sf/I(B) site (shown by an asterisk) at 
the 5' half. The first strand is synthesized by Moloney murine leukemia virus reverse transcriptase (M-MLV RT) from 
the linker-primer hybridized with the RNA molecule. The second strand Is synthesized by DNA polymerase I from 
the nicks on the RNA moiety introduced by RNase H. The linker-primer is converted to the double-stranded form in 
the first or second strand synthesis step. T4 DNA polymerase treatment makes double-stranded any single- 
stranded extensions remaining on the synthesized cDNA molecule. The Sfi\ adaptor ligation adds one 3' single- 
stranded extension, the Sfrl(A-) sticky end, to the cDNA molecule. Another 3' extension, the Sf/I(B+) sticky end. is 
exposed by Sfi\ cleavage of the Sf/I(B) site on the repaired linker-primer portion of the cDNA molecule. 

Fig. 4. Cloning of a model insert into pCEV15 using the ADC method. Panel (A), restriction map of pCEV15-RAS. 
The plasmid was constructed by cloning a 0.7 kbp fragment containing the mouse H-ras (v-bas) coding sequence 
(Reddy et al.. 1985) into the EcoRI site of pCEV15. The open thick arc and closed thin arc represent the H-ras 
insert and vector, respectively. Panel (B). analysis of ligation products. pCEV15- HAS DNA was digested with Sf/I, 
and vector (4.2 kb) as well as insert (0.7 kb) fragments were purified from the gel. Similarly. EcoRI/Apal fragments 
were prepared as controls. The vector and/or H-ras insert Sfi\ fragments (left half) or EcoRIMpal fragments (right 
half) were incubated in kinase ligase buffer (see below) with or without T4 DNA llgase as indicated. The ligation 
products were analyzed by agarose gel electrophoresis. Sizes of the fragments are shown in kb. Panel (C), HindlW 
digestion of pCEV15 and its derivatives, (lane a) pCEV15; (lane b) pCEV15-RAS ; (lane c) pCEV15 containing the 
H-ras insert in opposite orientation; (lane d) a marker (1-kb ladder; Bethesda Research Laboratories); and (other 
lanes) Plasmid DNAs isolated from 20 individual kanamydn-resistant colonies obtained by transformation of DH5a 
with the vector ligated to H-ras insert Sfil fragments. 

Fig. 5. PDGF receptor clones isolated from the ApCEV9-M426 cDNA library. Panels (A) and (B). cDNA clones 
encoding for p and a PDGF receptors, respectively. The structure of each PDGF receptor cDNA Is schematically 
shown with restriction sites. Open boxes represent coding sequences, while non-coding sequences are shown by 
bars. The clones shown by thick lines were isolated from the M426 cDNA library. The thin lines represent clones 
isolated from other libraries as described (Matsui.T, et al.. 1989, Science 245. 800-804): HB15. HB3. and HB6 
were derived from the human brain stem cell cDNA library in Agll 1 (provided by R. Lazzarini; Matsui et al., 1989. 
supra); HF1 from the Okayama-Berg human fibroblast cDNA library (Okayama and Berg. 1982); and EF17 from a 
randomly-primed M426 cDNA library in Agt11 (Matsui et al.. 1989. supra). Panels (C) and (D). nucleotide 
sequences of 5'-untransIated regions of p and a PDGF receptor clones HPR5 and TR4, respectively. Sequencing 
was performed by the chain termination method (Sanger et al., 1977, Proc. Natl. Acad. Sd. USA 74. 5463-5467). 
The initiation codons are underlined. 

Rgure 6. Structure of the cDNA cloning-expression vector ApCEV27. 

Structure of the ApCEV27 genome is shown at the upper half with the location of I genes. The plasmid part is 
enlarged and shown at the lower half as a drcular map. The multiple excision site (MES) contains the restriction 
sites for infrequent cutters; NotI, Xhol, Pvul, and Mlul. The multiple doning site (MCS) contains the restriction sites 
for BamHI. Sail, Sfil(A). EcoRI. Bglll, Hindlll. Sfil(B). Sail, and BstEII. and was placed in the clockwise orientation. 
The two Sfil sites are used to insert cDNA molecules by the automatic directional cloning method (Miki, T. et al. 
(1989) Gene 83, 137-146.), and the two Sail sites are used to release the inserts. SP6-P and T7-P represent the 
phage promoters for SP6 and T7 RNA polymerases, respectively. The trp-lac fused promoter tac and SV40 early 
promoter are used to express the neo structural gene in E. coli (kanamydn resistance) and eukaryotic cells (G-418 
resistance), respectively. The directions of transcription from the promoters are shown by the an-ows. Polyadenyla- 
tion signals are labeled as polyA. The locations of tiie replication origins (ori) and the ampidllin resistant gene 
(amp) are shown. 
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Figure 7. Strategies for expression cloning of transforming gene cDNAs. 

Cells of NIH 3T3 are transfected by ApCEV27 cDNA library DNA. Transformed cells are isolated from induced 
foci and assayed for G-41 8 resistance and colony formation in soft agar. Cells are expanded and the genomic DNA 
isolated. The DNA is digested by either NotI, Xhol or Mlul, and then ligated in a low concentration. A bacterial strain 
is transformed by the ligated DNA and colonies resistant to both the ampicillin and kanamydn are isolated. Plasmid 
DNA is extracted from each colony and used to transfect NIH 3T3 cells to examine focus formation. Since cDNA 
library-induced foci is presumed to contain multiple cDNA clones, transforming plasmids are identified in this trans- 
fection assay 

Rgure 8. Detection of hepi cDNA Inserts in the ST18-2 cDNA ilbrary-indcuced transformants. 

The genomic DNA from individual transformants (from CT18-1 A to CT18-1G) were digested by Sail which can 
release the cDNA inserts (see Figure 6). The digested DNA (5 mg) was separated on a 0.5% agarose gel by elec- 
trophoresis and transferred to a supported nitrocellulose membrane (Nitrocellulose GTG. FMC BioProducts). The 
Southern blot was probed by the hepi cDNA insert of pHEPI-B which was rescued from the transformant T18-B. 
The NIH 3T3 genomic DNA was used as a negative control. The location of each fragment of the molecular size 
marker (1 kb ladder, BRL) is shown at the right side in kb. 

Figure 9. Sequence homology of the hepi and B-raf gene cDNAs. 

A restriction map of the cDNA insert of pHEPI-B was schematically shown at the top(a). The regions where 
nucleotide sequence was determined are shown by arrows and labeled by A and B. The sequences are shown 
below(b). Computer analysis was performed by IntelliGenetics programs. B-raf sequence was taken from and num- 
bered as in Ikawa et al. (1988). 

Figure 10. Rearrangement and amplification of the hep1 iocus In the primary and secondary transformants. 

The sources of DNA and restriction enzymes used are shown at the top. The strains PT-1 and PT-2 are the pri- 
mary transformants induced by the original tumor DNA. The strain 18-1 was a secondary transformant Induced by 
PT-2 DNA and is the source of the cDNA library The genomic DNA (5 mg) was digested by Sail to release the 
cDNA inserts, separated on a 0.5% agarose gel by electrophoresis, and transfened to a supported nitrocellulose 
membrane. The Soutiiern blot was probed by the hepi cDNA insert of pHEPI-B. The NIH 3T3 genomic DNA was 
used as a negative control. The location of several fragments of the molecular size markers (1 kb ladder and high 
molecular weight DNA marker. BRL) are shown at the right side in kb. 

Rgure 11. Detection of mRNAs for the bral and B-raf genes. 

The poly(A)+ RNA extracted from the cells indicated were denatured and separated on a formaldehyde gel. 
RNA was transferred to a supported nitrocellulose membrane and probed by each probe. The 5' probe was isolated 
as the Sail-Hindlll fragment (see Figure 8). The 3' probe was prepared by polymerase chain reaction (PGR) using 
GeneAmp kit (Cetus Co.) from CT18-2B genomic DNA. The B-raf primer (5'-CCTCGAGATTCAAGTGATGAC-3') 
and the T7 primer (5'-CTAATACGACT CACTATAGGGG-S) were used for PCR and the amplified fragment was puri- 
fied from an agarose gel. 

FIG. 12 Ceii morphology of control NIH/3T3 and transformants induced by keratinocyte cDNA expreslon 
library. 

NIH/3T3 cells (A) and NIH/3T3 cells transfected with tfie ecti (B). ect2 (C), or ect3 (D) at 21 days post-trans- 
fection. Cells were maintained in Dulbecco's modified Eagle's medium (DMEM) containing 5% calf serum, (x 180) 

RG. 13 Specific binding of [^^%KGF to BALB/MK, NIH/3T3, and NiH/ectI, NiH/ect2, and NIH/ect3 trans- 
formants. 

IMethods: Recombinant KGF was radiolabeled with l^^^Ua by the chloramine-T method as described previously 
Confluent cultures in 24-welI plates were serum-starved for 24 h, followed by incubation witii HEPES binding buffer 
(HBB: 100 mM HEPES, 150 mM NaCI, 5 mM KCI. 1.2 mM MgS04. 8.8 mM dextrose. 2 mg/ml heparin, and 0.1% 
BSA, pH 7.4) containing [^^%KGF for 1 h at 22''C. The cells were tiien washed with cold PBS, lysed with 0.5% 
SDS. and cell-associated radioactivity was measured in a gamma counter. Bound cpm were normalized to the cell 
protein content of SDS extracts. Specific binding was determined by subtracting normalized cpm of samples incu- 
bated with 100-fold excess unlabeled KGF from the normalized cpm bound in the presence of [^^^q-KGF afone. 

RG. 14 DNA and RNA analysis f the ecti sequence, a, Southern analysis of the Sail-digested DNAs from 
NIH/3T3 and its transformants. The blot was probed with the entire ecl1 cDNA insert. Sine Sail is an infrequent 
cutter of mammalian DNA, most of the DNA fragments are extremely large and migrate near the origin of the gel. 
However, tiie cDNA inserts released by Sail from tiie vector are shorter and migrate into the gel allowing tiie detec- 
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tion of the insert without detection of the endogenous ect1 gene. 

b, Southern analysis of EcoRI-digested DNAs of different animal species (Clontech, Palo Alto, CA). The blot 
was probed with th 5'-half of the ecti cDNA Ins rt and washed under reduced stringency conditions. . 

5 c, Northern analysis of NIH/3T3 and BALB/MK RNA. The blot was probed witii the 5'-half of the ecti cDNA 

(lanes 1 and 2) or b-actin cDNA (lanes 3 and 4) and washed under stringent conditions. 
Methods: For plasmid rescue, genomic DNA was cleaved by one of the infrequent cutters which can release 
the plasmids containing cDNA inserts. Digested DNA was ligated under diluted conditions and used to trans- 
form bacterial competent cells. Plasmids were isolated from ampicillin- and kanamycin-resistant transformants 

10 and used to transfect NIH/3T3 cells to examine for focus formation. The ect1 plasmid was rescued by Xhol, 

while the ect2 and ect3 plasmids were rescued by NotI digestion. For Southern analysis, DNA (10 \iq) was 
digested by Sail (Panel a) or EcoRI (Panel b), fractionated by agarose gel electrophoresis, and transferred to 
a nylon-supported nitrocellulose paper (Nitroceilulose-GTG, FMC. Rockland, ME). The blot in Panel a was 
hybridized with the ^^P-labeled entire ecti insert at 42** C and washed at SS^C in 0.1 x SSC. while tiie blot In 

15 Panel b was hybridized with the ^^P-labeled 5'-ect1 probe (see Fig. 15b) at 37* C and washed at 55* C in 0.1 

X SSC. Location of DNA molecular weight markers (BRL. Gaithersburg, MD) is indicated in kb. For Nortiiern 
analysis (Panel c), poly(A)* RNA (5 ^g each) was fractionated by formaldehyde gel, transferred to Nitrocellu- 
lose GTG. and hybridized with the 5' ecti probe (lanes 1 and 2). After autoradiography, the filter was boiled to 
remove the probe and then hybridized with a b-actin probe (Gunning et al Molec. Cell Biol. 3, 787-795 (1983)) 

20 (lanes 3 and 4). Location of molecular weight markers (BRL, Gaithersburg, MD) is indicated in kb. 



All hybridization experiments were performed at the indicated temperature in a solution containing 50% forma- 
mide, 5 x SSC, 2.5 x Denhardfs solution, 7 mM Tris-HCI (pH 7.5). 0.1 mg/ml of denatured calf thymus DNA. and 
25 0.1 mg/ml of tRNA. 

FIG. 15 Nucleotide sequence of the KGF receptor cDNA. 

a, Nucleotide sequence and deduced amino acid sequence of the coding region of tiie KGF receptor cDNA. 
30 Nucleotides are numbered from the 5'-end of the cDNA. Initiation and termination codons are underlined. 

Amino acids are numbered from the putative initiation site of translation and shown above the amino acid 
sequence. Potential sites of N-linked glycosylation are indicated by dots above the residues. The potential sig- 
nal peptide and ti'ans-membrane domains are underlined. The interkinase domain is shown by underlined italic 
letters. Glycine residues considered to be involved in ATP binding are indicated by asterisks. Cysteine residues 
35 delimit two Immunoglobulin-like domains In the extra-cellular portion of the molecule are shown by : over tiie 

residues. Nucleotide sequence was determined by the chain termination mettiod (Sanger et al PNAS 74, 5463- 
5467 (1977)). 

b, Sti-uctural comparison of the predicted KGF and bFGF receptors. The region used as a probe for Soutiierri 
and Northern analysis (Rg. 14b and c) is indicated. The region homologous to the published bek sequence^' 

40 is also shown. TTie schematic structure off the KGF receptor is shown below the restriction map of the cDNA 

clone. Amino acid sequence similarities with the smaller and larger bFGF receptor variants are indicated. S, 
signal peptide; IG1, IG2, and IG3, immunoglobulin-like domains; A, acidic region;. TM, transmembrane 
domain; JM. juxtamembrane domain; TK1 and TK2, tyrosine kinase domains; IK, interkinase domain; C, C-ter- 
minus domain. Amino add sequence comparison was performed using tine method of Pearson and Upman 

45 (Pearson et al, PNAS 85, 2444-2448 (1 988)). 

FIG. 16 Competition of KGF, aFGF, and bFGF for [^^Sij.^qf binding on BALB/MK ceils (A) and NIH/ecti 
ceils (B). 

Methods: Binding assays were performed as described for Fig. 14, except that cells were incubated with C^^l\- 
50 KGF in tiie presence of unlabeled KGF, aFGF or bFGF at conc^rations indicated on the x-axis. For Scatchard 
analysis, sanples contained several concentrations of [^^Sq-KGF (1-100 ng/ml) in the presence or absence of a 
100-fold excess unlabeled KGF, and were also processed as outiined in Fig. 13. Estimates of receptor affinity and 
total binding capacity were made using LIGAND software (Munson et al. Anal. Biochem. 107, 220-239 (1980)). 

55 FIG. 17 a, Covalent affinity crosslinking of [^^Sq.kGF to BALB/MK (left), NIH/3T3 (center), and NIH/ecti cultures 
(right). The left and center panels of this autoradiogram were exposed to Kodak XAR film for 72 h at -70*C; the right 
panel is an 1 8 h exposure f the same autoradiogram. The second lane for each cell type shows crosslinking performed 
in the presence of excess unlabeled KGF Molecular weight markers are indicated on th left; the positions of [""^^i]- 
I KGF-crosslinked complexes are indicated by arrows. 
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b, Autoradiogram of phosphotyrosyl-proteins from intact NII-I/3T3 (left) and NIH/ectI cells (right) before and after 
treatment with KGR Molecular weight markers are indicated on the left; the estimated molecular weights of proteins 
displaying KGF-stimulated phosphorylation on tyrosine are shown at right. 

Methods: Samples for covalent crosslinking were prepared from confluent, serum-starved cultures in 6 cm dishes, 
5 using 10 ng/ml [^^^l]-KGF in the presence or absence of 30-fold excess KGR After binding (as described for Fig. 
1 7), crosslinking with disuccinimidyl suberate was performed as described previously. The cells were then scraped 
into cold HBB containing 0.1 mM aprotinin and 1.0 mM phenylmethylsulfonylfluoride, and a crude membrane frac- 
tion was generated by brief sonication (50 W. 10 sec), low-speed centrifugation (600 x g. 10 min), and high-speed 
centrifugatlon (100,000 x g, 30 min) of the low-speed supernatant. The membrane pellet was solubilized in Lae- 
70 mmli sample buffer (Laemmli, Nature 227. 680-685 (1970), containing 100 mM dithiothreitol, and boiled for 3 min. 
[i2^l]-labeled proteins were resolved by 7.5% SDS-PAGE and autoradiography of the dried gel. Analysis of phos- 
phoproteins was performed as follows. Confluent cultures in 1 0 cm dishes were serum-starved for 24 h, then 
treated with (+) or without (-) KGF (30 ng/ml) for 10 min at 37**C. The medium was aspirated, and the cells were 
solubilized in cold HEPES buffer containing 1% Triton X-100, protease- and phosphatase-inhibitors as described 
15 previously. The lysate was cleared by centrifugation, and phosphotyrosyl proteins were immunoprecipitated with 
affinKy-purif ied anti-Ptyr adsorbed to GammaBind G-agarose (Genex, Gaithersburg, MD). Phosphotyrosyl proteins 
were specifically eluted using 50 mM phenyl phosphate, diluted in Laemmli sanple buffer, and resolved by 7.5% 
SDS-PAGE. Proteins were then transferred to nitrocellulose and detected with anti-Ptyr and [''^^l]-protein-A as 
described previously. 

20 

DESCRIPTION OF SPECIFIC EMBODIMENTS 

In one aspect, the present invention relates to a vector having nonidentical non-symmetrical restriction enzyme rec- 
ognition site sequences, as described above, also including regulatory elements located such that the sequences of an 

25 inserted DNA segment are transcribed, where the regulatory elements are at least partly of eukaryotic origin. A princi- 
pal embodiment of this aspect of the present invention is exemplified by two X-plasmid composite vectors. ApCEV15 
and ApCEV9. the structures of which are depicted in Figure 1 and described further below, in Example 1. 

To examine the performance of the ADC method, a model H-ras insert was prepared so as to have S//I(A") and 
Sfi\{B^) ends (Fig. 4A) and ligated with the pCEV15 Sfi\ fragment To show the difference between the ADC method 

30 and the "forced" cloning method using two different restriction enzymes, a similar H-ras fragment with EcoRI and Apa\ 
ends was prepared (Fig. 4A) and ligated with the pCEV15 EcoHVApal fragment. To measure the efficiency of cDNA 
cloning using a natural template, 2.5 mg of a poly(A)+ RNA preparation was denatured by heating and used to synthe- 
size cDNA from a linker-primer. The results of all these experiments, described in Exanple 2, below, illustrate the 
remarkable efficiency of cloning of model inserts using this novel method of the present invention. 

35 To assess the performance of the cDNA cloning method, cDNA was synthesized using poly(A)'*" RNA extracted 
from M426 human embryonic lung fibroblast cells under the conditions described in Example 2, below. cDNA molecules 
larger than 1 kb were selected by low melting point agarose gel electrophoresis, arKi two aliquots were used to done 
into XpCEVS and XpCEVI 5. The average size of the cDN A inserts was 2.0 kb in the XCEV9 library (6x10® independent 
clones) and 2.2 kb in the XpCEVIS library (1x10^ independent clones). 

40 To characterize the M426 cDNA library in ApCEV9 further, it was saeened for the human p PDGF receptor cDNA. 
Before this cDNA library was constructed, clones isolated (HB3 and HB15) by screening several other libraries did not 
contain the entire coding sequence (Fig. 5A). When a part (9x10^ pfu) of the M426 cDNA library was screened for the 
human p PDGF receptor, six clones were isolated. Of these, three contained inserts of approximately 5 kb. Sequence 
analysis showed that two (HPR2 and HPR5) contained the entire coding sequence (Fig. 5A). Recently, Matsui et al., 

45 1989. supra) have identified the cDNA of a novel PDGF receptor, designated the a PDGF receptor by isolation of over- 
lapping cDNA clones (HF1 , HB6 and EF1 7 in Fig. 5B). Re-probing of filters from M426 cDNA library for the human a 
PDGF receptor resulted in the isolation of 93 clones. Of 7 clones analyzed, 5 including TR4 contained inserts of 6.4 kb, 
which corresponded to the size of the message (Fig. 5B). As shown in Rgs. 5C and 5D, sequence analysis of a and p 
receptor cDNAs isolated from the M426 library revealed 5'- untranslated sequences followed by initiation codons for the 

50 complete coding sequence of each gene. These results indicated that the cDNA cloning system described here is suit- 
able for isolation of relatively long cDNAs. 

This method has been used for more than one year in tiie laboratory of these inventors, without public disclosure, 
to construct several cDNA libraries. Screening of the libraries for growtii factors and receptors has been performed, and 
in most cases cDNA clones containing the entire coding sequence have been obtained as single clones. For example. 

55 a number of cDNA clones encoding keratinocyte growth factor have been isolated from the XpCEV9-M426 cDNA library 
using oligonucleotkle probes. Saeening of an MCF7 cDNA library in ApCEV9 constructed by the ADC method for a 
novel erbB-r lated gene resulted in the isolation of the cDNA clones of 5 kb witii high frequency as well. All of these 
findings indicate that the ADC method using ApCEV vectors makes it possible t clon relatively long cDNAs very effi- 
cientiy 
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From the present data, the following conclusions may be drawn: 

(1) The ADC procedure resulted in very high cloning efficiency ( 10^"10^ clones/fig of mRNA). 

(2) Usually backgrounds f libraries constructed by this method are very low. When the vector arms are prepared 
5 carefully almost all of the clones contain cDNA inserts. 

(3) cDNAs are inserted into the vectors in a directed orientation and as single inserts, making analysis of cDNA 
inserts simple and straightforward. 

(4) The vectors can accommodate longer inserts than other X vectors without sacrificing cloning efficiency, making 
possible to clone relatively long cDNA fragments. 

w (5) Plasmids carrying cDNA inserts can be released from X genomes by Not\ cleavage. This feature facilitates the 
structural analysis of cDNA clones, permits the generation of size-selected plasmid sub-libraries, and makes it pos- 
sible to recover the cDNA clones by plasmid rescue from eukaryotic cells. 

The inventors have further refined the vectors to allow high levels of cDNA expression in mammalian cells and the 
15 ability to perform plasmid rescue. Exanple 3 tests the potential of the improved approach by its application to the iso- 
lation and characterization of unknown oncogenes from hepatocellular carcinomas of the BeCaF^ mouse strain, exten- 
sively utilized in long term carcinogenesis testing in the United States. 

Example 4 discloses the utility of the refined directional cDNA library expression vector for isolating the keratinocyte 
growth factor (KGF) receptor cDNA by creation of an autocrine transforming loop. This expression cloning approach 
20 was successful to identify and functionally clone the receptor for the new growth factor. 

Example 1. Construction of XpCEV15 and XdCEV9 . 

The following materials and methods were used in this and the subsequent examples, as needed. 

25 Restriction enzymes. DNA polymerases, T4 DNA ligase, and T4 polynucleotide kinase were purchased from New 
England BioLabs. Bethesda Research Laboratories, and Boehringer Mannheim. M-MLV reverse transcriptase and 
RNaseH were from Bethesda Research Laboratories. Bacterial strain LE392: P, hsdR511(rk' mk') supE44 supF5d 
lacYI or D(faclZY)6 galK2 galT22 metB1 trpR55 was used as a host of X. DH5a (Bethesda Research Laboratories) 
was used for bacterial transformation. NZY broth (lOg NZ amine, 5g NaCI, 5g Yeast extract in 1 I. pH 7.5) was used to 

30 grow bacterial strains. M426 is a human lung embryonic fibroblast cell line (Aaronson and Todaro, 1 968, J. Virology 36, 
254-261). Oligonucleotides were synthesized by a Beckman System 1 DNA Synthesizer and purified by high perform- 
ance liquid chromatography. Oligonucleotides utilized had the following sequences. 

#1:GATCCGTCGACGGCCATTATGGCCAGAATTCTGGGCCCG, 
35 #2:TCGACGGGCCCAGAATTCTGGCCATAATGGCCGTCGACG. 
#3:AATTCAGGCCGCCTCGGCCAAGCTTAGATCTGGGCCCG, 
#4:TCGACGGGCCCAGATCTAAGCTTGGCCGAGGCGGCCTG. 
#5:TGGATGGATGG, 
#6:CCATCCATCCATAA, 

40 #7 and #8:GGACAGGCCGAGGCGGCC(T)n, where n=20 or 40 in the case of #7 and #8. respectively. 

Plasmid DNA was prepared by the "selective precipitation procedure" which is a modification of the alkaline lysis 
method (Birnbdm and Doly. 1979. Nucleic Acids Res. 7, 1513-1523). This technique makes it possible to prepare suf- 
ficient pure plasmid DNAs to analyze and alter structures, without a requirement for lysozyme treatment, phenol extrac- 
ts tion or repeated ethanol precipitations. Cells collected from a 10 ml culture were resuspended into 0.2 ml of TEG (25 
mM Tris-HCI pH 8.0/10 mM EDTA/50 mM glucose). After transfer to a microcentrifuge tube, 0.2 ml each of 2% sodium 
dodecyl sulfate and 0.4 M NaOH was added, mixed, and incubated at room tenperature for 5 min. After the addition of 
0.2 ml of 3M ammonium acetate (pH 4.8), incubation at 0 C for 10 min, and centrifugation for 15 min in a microcentri- 
fuge, the supernatant was transfen-ed to a fresh tube containing 0.2 ml of 2 M Tris-HCI (pH 8.9) and 2 ml of 2 mg/ml 
50 RNase A. Following incubation at 37 C for 30 min and centrifugation, the supernatant was transferred to a new tube 
containing 0.6 ml cold isopropanol. The tube was inverted several times and incubated at room temperature for 1 0 min. 
DNA was collected by centrifugation and then washed with 75% ethanol. The pellet was dried by incubation at 37 C for 
5 min and resuspended into 50 ml of 10 mM Tris-HCI (pH 8.0), 1 mM EDTA. 

The X DNAs prepared as follows were used to modify the structure, analyze cDNA clones in ApCEV vectors and 
55 then to rescue the plasmid part. Host cells grown in 10 ml of NZY medium containing 2 mM MgCl2 were suspended into 
the same volume of SM buffer (50 mM Tris-HCI, pH 7.5/8 mM MgSO4/100 mM NaCI/0.01 % gelatin). A single plaque 
picked by a pasteur pipet was incubated with 0.1 ml of the host cell suspension at 37 C for 30 min. Ten ml of pre-warmed 
NZY broth containing 2 mM MgCIa was added and shaken at 37 C for 6 h. This procedure allows single-step production 
of high-titer lysates. Phage particles were precipitated and DNAs prepared as described by Arber at al. 1983 [In Hen- 
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drix, R. W.. Roberts. J. W., Stahl, F. W. and Weisberg, R. A. (Eds.). Lambda II. Cold Spring Harbor Laboratory, Cold 
Spring Harbor, NY. 1983, pp. 433-466], with several modifications. A few drops of chloroform were added to the lysate, 
mixed and debris was removed by centrifugation. After the chloroform remaining in the lysat was removed by Incuba- 
tion at 37 C, 50 I each of DNase I (1 mg/ml) and RNase A (1 mg/ml) were added, and incubated at 37 C for 1 h. Phage 

5 particles wer precipitated by the addition of 5 ml of 30% PEG/3 M NaCI/10 mM MgCl2 followed by incubation on ice 
for 1 h. Phage particles were collected by centrifugation at 3,000 rpm for 30 min., and the pellet was resuspended in 
0.5 m! of SM. The suspension was transferred to a fresh microcentrifuge tube containing 20 I of proteinase K (20 
mg/ml). After incubation at 37 C for 15 min. 50 1 of 100 mM Tris-HCI (pHS.O) / lOOmM EDTA/ 1% sodium dodecylsulfate 
was added and the tube was incubated at 65 C for 30 min. The released DNA was extracted by phenol/chloroform and 

yo then by chloroform. DNA was precipitated by 0.6 volume of isopropanol with 0.3 volume of 7.5 M ammonium acetate, 
washed with 75% etiianol and dried. The pellet was dissolved in 10 mM Tris-HCI (pH 8.0), 1 mM EDTA. 0.1 mg/ml of 
RNase A and incubated at 37 C for 30 min to digest ribosomal RNAs. 

Plasmid DNAs prepared by the selective precipitation method were directly used to modify the structures, by 
restriction enzyme digestions, repair reactions, and/or ligation with synthetic linkers. DNA fragments were separated on 

15 agarose gels and then purified using GENECLEAN (BIO 101 Inc.). 

Insertion of oligonucleotides was performed as follows. The two sti^ands of non-phosphorylated oligonucleotides 
were annealed and ligated witii plasmid DNA which has been digested with suitable restriction enzymes. One strand of 
the oligonucleotides which was not ligated (due to the 5'-0H structure) was removed by heating and then separated by 
agarose gel electrophoresis. The purified fragments were phosphorylated and then ligated. 

20 Plasmid pCEV9 was constructed as follows. A retroviral vector pDOL' (Korman et al., 1987, Proc. Natl. Acad. Sd. 
USA 84, 2150-2154) was cleaved by Xba\ and recircularized to remove tiie polyoma segment. The C/al site was con- 
verted to a Not\ site by linker insertion, and the EcoRI site was removed by repair ligation. A synthetic MCS linker con- 
sisting of oligonucleotides #1 and #2 (see Exanple 1 . below) was inserted between the Sa/I and BamHI sites, and the 
Bcl\/BamH\ fragment of SV40 DNA (Bethesda Research Laboratories) containing its polyadenylation signal was 

25 inserted at the Xho\ site. These manipulations produced pCEV9. 

Xgtll DNA (Young and Davis. 1983, Proc. Nati. Acad. Sci. USA 80, 1194-1198) was ligated and then cleaved by 
FcoRI and Xbal The ends were repaired and A/ofl-linkered. and then pCEV9 DNA linearized by Not\ was cloned into 
the X DNA to produce XpCEV9. 

pCEV15 was constructed by modifying pCEV8. which is a parental plasmid of pCEV9 and lacks the MCS linker. A 

30 tac promoter fragment (Pharmacia) and Xho\ linker were inserted in the Sa/I site. S//I(B). HindlW, and 6^/11 sites were 
removed successively by restriction enzyme digestion, polymerase treatment, and subsequent ligation reaction. 
Removal of the Sfi\ she (on the SV40 replication origin) did not impair the SV40 early promoter activity. The MCS linker 
was inserted between ttie BamH\ and Sa/I sites, and S//I(B). HindWl and BglW sites were introduced again by insertion 
of the MCS-2 linker (oligonucleotides #3 and #4, below) between ttie EcoRI and Apa\ sites. The resulting plasmid 

35 pCEVI 5 was doned in a new X vector constructed as follows. The segment spanning from tiie Xho\ site to the right cos 
end of XpCEV9 was replaced by tiie con-esponding fragment of Xcharon28 (Rimm et al., 1980, Gene 12, 301-309). to 
introduce the cl deletion (KH54) and remove three HindlW sites. The resulting phage XpCEV9c DNA was ligated, 
cleaved by Hind\\\, and tiien repaired. A Not\ linker was ligated to the repaired HindlW ends and the DNA was deaved 
by Notl The X arms were purified and ligated with A/o?l-digested pCEV15 DNA, to produce ApCEV15. 

40 

Example 2. Efficiencv of cloning and orientation of nrKxiel inserts . 

Preparation of X arms and the S//I adaptor for all cloning experiments was performed as follows. X DNA was ligated 
to seal cohesive ends and then cleaved sequentially by S//I and EcoRI. After phenol/chloroform extraction. ApCEV9 

45 arms were purified by centrifugation through a 5-20% potassium acetate gradient (Maniatis et al., 1982). ApCEV15 
arms were purified by passage through a Sephadex G-50 spin column (Boehringer Mannheim Biochem.). 

The S//I adaptor was prepared as follows. About 1 nmol of oligonucleotides #5 and #6 were separately phosphor- 
ylated by T4 polynucleotide kinase, mixed, heated at 80°C for 5 min and then slowly cooled to 4*^0. 

RNA for making cDNAs was extracted and Poly (A)* RNA selected as described by Okayama et al. (1987). cDNA 

50 was synthesized essentially according to D'Alessio et al. (1 987) witti some modifications. About 2.5 mg of poly(A)'^ RNA 
in 1 0 ml of HgO was mixed with 0.5 ml of 1 0OmM methylmercuric hydroxide (MeHg), and incubated at room temperature 
for 5 min, followed by addition of 0.5 ml of 1 .4M b-mercaptoethand. After 5 min, 1 .2 ml of RNasin (40 units/ml; Promega 
Biotech.). 17.8 ml of H2O. 10 ml of 5x PS buffer (250mM Tris-HCI, pH 8.3/375mM KCI/15 mM MgGl2/100 mM dithioth- 
reitol), 2.5 ml of dNTP mixture (10 mM each of dGTP, dATP, dTTP and dCTP), 5 ml of linker-primer (oligonudeotide #8; 

55 1 mg/ml) and 2.5 ml of M-MLV reverse transcriptase (200 units/ml) were sequentially added, mixed, and incubated at 
37*'C for 1 h. The tube was chilled on ice and 290 ml of HgO, 7.5 ml of dNTP mix (10 mM each of dGTP, dATP. dCTP, 
and TTP), 40 ml of lOx SS buffer (188 mM Tris-HCI, pH8.3 / 906 mM KCI / 46 mM MgCl2 / 38 mM DTT), 10 ml of DNA 
polymerase I (1 .25 untts/ml) and 1 .8 ml of RNase H (0.25 units/ml) w re added, mixed and incubated at 1 6*'C for 2 h. 
The reaction mixture was heated at 70**C for 10 min, and 5 ml of T4 DNA polymerase (1 unit/ml) was added and incu- 



17 



EP 0 773 294 A2 



bated at Zl^'C for 10 min. The reaction was terminated by the addition of 40 ml of 0,25M EDTA, and the mixture was 
extracted by phenol/chloroform twice followed by chloroform twice. cDNA was ethanol-predpitated from 2,5 M ammo- 
nium acetate, washed, and then dried. The p llet was dissolved into 10 ml of and then 4 ml of Sfi\ adaptor (0.8 
mg/ml). 4 ml of 5x ligation buffer (500 mM Tris-HCI, pH7.6/100 mM MgCl2 /10 mM ATP/1 OmM dithiolhreitol/ 50% (w/v) 

5 polyethylene glycol-8000) and 2 ml of T4 DNA ligase (1 unit/ml) were mixed and incubated at overnight. A 10 ml 
aliquot was then mixed with 1 ml of lOx Sfi\ buffer (100 mM Tris-HCI, pH 7.9 / 500 mM NaCI / 100 mM MgCl2 / 60 mM 
b-mercaptoethanol / 1 mg/ml bovine serum albumin), 2 ml of Sf/I (10 units/ml) and 7 ml of H2O. Digestion was per- 
formed at 50 C for 1 h. cDNA fragments were purified by low-melting point agarose gel electrophoresis or passing 
through a spun column (Maniatis et al., Molecular Cloning. Cold Spring Harbor 1982) packed with Sepharose CL-4B 

io (Pharmacia). An aliquot of the cDNA preparation was then mixed with XpCEV vector arms and ethanol-precipitated. 
DNA was dissolved in 8 ml of H2O, and then 1 ml of 10x kinase-ligase buffer (660 mM Tris-HCI. pH7.5/100 mM 
MgCl2/50 mM dithiothreitol. 500 mM ATP) and 1 ml of T4 DNA ligase (1 unit/ml) were added, mixed, and incubated at 
^A°C overnight In vitro packaging was performed using GigaPack Gold (StrataGene) as directed. 

As shown in Fig. 4B, neither of the vector nor H-ras insert Sfi\ fragment was self-ligated, while self-ligation occurred 

15 when the EcoRI//\pal fragments were used instead. In the ADC system, ligation occun-ed only when both the vector and 
insert fragments were present in the reaction mixture (Fig. 4B). To characterize the directional cloning capacity of the 
system, the H-ras insert and vector Sfi\ fragments were ligated. used to transform an E. coli strain DH5a. and 20 kan- 
amycin-resistant colonies were analyzed. /Vs shown in Fig. 4C. all plasmids contained single inserts in the expected ori- 
entation, indicating that the ADC method provides both directional cloning and positive selection for tiie presence of 

20 inserts. To further examine the performance of the ADC method using the X system, model inserts prepared to have 
Sf/1(A') and S//I(B'') ends were ligated with XpCEV9 arms (see Fig. 2C). As shown in Table I, pCEV9 arms alone did not 
produce active phages efficientiy even when the ligation reaction was carried out. while presence of model inserts in 



Table I 



Packaging efficiency of XpCEV9 DNA 


DNA 


Titer* (pfu/^g X arms) 


ApCEV9 arms 


1 x10^ 


ApCEV9 arms, ligated 


8x10^ 


ApCEV9 arms + Insert a'', ligated 


8x10^ 


ApCEV9 arms + Insert B*^, ligated 


8x10^ 



35 Footnotes for Table I: 

a The reaction mixture contained 66 mM Tris-HCI (pH7.5). 10 
mM MgCl2, 5 mM dithiothreitol, 50 mM ATP, and 0.1 |ag/ml of 
DNA. Incubation was performed at 1 4''C overnight, and the 
phage were produced l>y in vitro packaging and titered on 
LE392. 

''^ *> A 2 kb DNA fragment having the S//I(A-) and S//I(B+) ends 

(see Fig. 2). 

^ A DNA fragment similar to the insert A. except that the Sffl(A' 
) end was created by ligation of the Sfi\ adaptor. 

45 

the ligation mixture increased the titer of active phages by three orders of magnitude. These results indicated that suc- 
cessful ligation and phage propagation depended on the presence of the model insert in the reaction mixture. All of thse 
findings indicated that the cloning procedure results in low background and efficient directional cloning. 

To measure the efficiency of cDNA cloning using a natural template, 2.5 mg of a poly(A)* RNA preparation was 
50 denatured by heating and used to synthesize cDNA from a linker-primer (oligonucleotide #7). The cDNA was blunt- 
ended, and the Sfi\ adaptor was ligated to botii the ends. The molecules were cleaved partially by Sfi\, and then cloned 
in XpCEV9. A total of 2.5 x 10^ plaque forming units (pfu) was obtained, indicating that the method was extremely effi- 
cient. 

Since Sfi\ is an infrequent cutter, almost all cDNA species in the libraries constructed by the ADC method should 
55 remain intact. Nonetheless, cDN/^s containing Sfi\ sites might be excluded from our cDNA libraries. To solve the prob- 
lem, partial Sfi\ digestion is usually performed. /Vn alternative strategy involves cDNA synthesis from a linker-primer 
containing the recognition site of another infrequent cutter Mlul in addition to the Sfi\ site. TTi cDNA preparation could 
then be divided into two parts, one cleaved by Sf/I and the otiier by Mtul A shat oligonucleotid ligated to tiie cDNAs 
could be utilized to convert tiie Mlu\ end to an Sf/I(B*) end. 
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Example 3. Isolation of a Mouse Hepat ma Oncogene cDNA Using a Novel Phen typic Expression Cloning 
System. 

The following protocols/methodologies ar referred to in sections 3.1-3.6 below. 

5 

cDNA Library Construction 

cDNA libraries were constructed as described above except the use of newly-designed adaptor and vector. The 
new Sfil adaptor does not contain the ATG codon in the sense strand and was consisted of two oligonucleotides, 
10 5'-CCAATCGCGACC-3' and 5'-GGTCGGGATTGGTAA-3'. 

Amplification of the library and preparation of the DNA were performed by the standard procedures (Maniatis et a). 
1982). 

Cell Culture and DNA transfection 

15 

All cells used were the derivatives of NIH 3T3 (Jainchill. J.L. et al. (1969) J. Virof. p. 549.). Calcium phosphate 
transfection (Wigler, M. et al., (1977) Cell 1 1 . 223.) was used to introduce DNA into cells. Cells were maintained in Dul- 
becco's modified Eagle's medium (DM EM) containing 5% calf serum. 

20 Plasmid Rescue 

The genomic DNA (1.2 ng) isolated from CT18-2B or CT18-2C was cleaved by Xhol. extracted with phenol-chloro- 
form and then chloroform. The DNA was precipitated and resuspended in 355 ^il of H2O. Fifty fil of 10 x kinase-ligase 
buffer (660 mM Tris-HCI, pH7.5/100 mM MgCl2/50 mM dithiothreitol. 500 >iM ATP) and 5 units of T4 DNA ligase (BRL) 

25 were added and incubated overnight at 15 °C. Tlie ligated DNA was extracted and precipitated as above and then 
resuspended in 10 jil of TE. Four aliquotes (100 \l\ each) of PLK-F' competent cells (Stratagene) were transformed by 
0.5. 1, 2, and 4 \i\ of the ligated DNA as directed by the manufacturer. After the heat shock, the cells were dileted 10 
fold with S.O.C. medium (BRL) containing 1 mM IPTG to induce expression of the neo gene drived by the tac promoter. 
The culture was incubated for 2 h with shaking and plated on NZY hard agar containing ampicillin (1 00 jig/ml), kanamy- 

30 cin (25 mg/ml), and IPTG (100 ^M/ml). 

Recombinant DNA Techniques 

Preparation of I and plasmid DNA was performed as described above. Genomic DNA was extracted by the stand- 
35 ard procedure (Maniatis et al., 1982). Total RNA was isolated and poly(A)-selected as described by (Okayama et al. 
(1987) PNAS 84, 8573). 

Southern and Northern Analysis 

40 DNA fragments were isolated by Geneclean (BIO101 Inc.) and labeled by random priming using Oligo Labeling kit 
(Pharmacia). Hybridizations were performed as described (Kraus et al., 1987). 

3.1 Development of an Efficient Stable Expression cDNA Cloning System 

45 The XpCEV27 system was developed to clone cDNAs by means of stable phenotypic changes induced by a spe- 
cific cDNA. Use of a Xislasmid composite vector made it possible to generate high complexity cDNA libraries and to 
efficiently excise the plasmid from the stably integrated phagemid DNA. This phagemid vector (Figure 6) contained sev- 
eral features including two Sfil sites for construction of cDNA libraries using the automatic directional cloning (ADC) 
method, an M-MLV LTR promoter suitable for cDNA expression in mammalian cells, the SV40 promoter-driven neo 

50 gene as a selectable marker, and multiple excision sites (MESs) for plasmid rescue from genomic DNA. The X-pCEV27 
system incorporated, in addition to the M-MLV LTR, the rat preproinsulin polyadenylation (polyA) signal downstream 
from the cDNA cloning site (Fig. 6). In this vector, the bacterial neo gene was placed under the independent control of 
the SV40 early promoter and the SV40 late polyA signal for use in marker selection in mammalian cells. In contrast to 
ApCEV15, the bona fide promoter of the neo gene was removed so as to fuse the SV40 promoter directly to the neo 

55 structural gene. Thus, in XpCEVIS, expression of the neo gene in E coll was achieved by transcription from the trp-lac 
fused promoter tac, inserted upstream from the SV40 early promoter (Fig. 6). By use of the tac promoter, it was possi- 
ble to utilize IPTG-inducible selection for kananrrycin resistance. Finally, the f 1 replication origin and, SP6 and T7 phage 
promoters were included to facillitate analysis of cDNA inserts by production of single-stranded DNA and RNA tran- 
scripts, respectively (Fig. 6). 
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The strategy for expression cloning of oncogene cDNAs is summarized in Figure 7. When library cDNA is used to 
transfect mammalian cells, cDNA clones are integrated with reconft)ination t)etween I DNA and host genomic DNA. For 
plasmid rescue, genomic DNA extracted from transformant Is subjected to digestion with an enzyme which can cleave 
the X-plasmid junctions. The resulting DNA can then be circularized and used for bacterial transformation. For this pur- 
5 pose, the sites for two additional infrequent cutters, Xhol and Mlul. were included along with the NotI site. Because of 
the second selectable marker in bacterial cells, the ampicillin resistance gene (amp), it was possible to select trans- 
formed E. coli cells resistant to both ampicillin and kanamycin, avoiding selection of truncated plasmid clones. 

3.2 Characterization of Oncogenes Activated in Mouse Liver Tuniors 

10 

We have previously analyzed hepatocellular tumors of the mouse strain B6C3F1 for the presence of activated 
oncogenes (Reynolds et al. (1987) Science 237. 1309-1316). Although the majority were activated ras or c-raf onco- 
genes, four could not be identified. The sources of these oncogenes were tumors designated 0T4, 0T18. OT23, and 
OT28. One (OT23) was spontaneously generated, while the others were associated with chronic furfural exposure 

75 (Reynolds. S.H.. et al.. (1987)). Genomic DNAs of NIH/3T3 transformants containing each of the unidentified onco- 
genes were examined under low stringency hybridization conditions using a number of known and potential oncogene 
probes including abl. myb. ets. fos, fgr. fms, rel. src, sis. yes, p53, ros. PDGF-A. met, dbl. IskT myc, N-myc, rho. mos, 
erbA, pin, lea, H-ras, N-ras, K-ras, c-raf, and erbB-2. None showed DNA fragments with either increased intensity or 
abnormal sizes relative to those detected in NIH/3T3 control DNA (data not shown). Thus, none of these transforming 

20 genes appeared to be closely related to any of the genes used as probes. 

3.3 Expression cDNA Cloning of an Oncogene of a Furfural-Induced Hepatoma 

Using the }^QEM27 expression cloning system, we attempted to clone transforming cDNA from one furfural- 

25 induced tumor, 0T18. A cDNA library (3x10^ independent clones) was constructed from poly(A)* RNA extracted from 
a secondary transformant of the tumor. Transfection of 80 plates of NIH/3T3 cells with 5 jig/plate of the expression 
cDNA library led to the detection of seven foci, which demonstrated G-41 8 resistance. These results indicated that each 
had taken up and stably integrated vector DNA, making it likely that the transformed foci were induced by exogenous 
cDNA rather than arising as a result of spontaneous transformation. 

30 Two of these transformants, designated GT18-2B and CT18-2G, were selected for plasmid rescue. By restriction 
mapping of several distinct plasmids obtained from each transformant, it was possible to establish that one plasmid res- 
cued from each had the identical insert (data not shown). These results suggested that this cDNA might encode tiie 
oncogene product. Transfection analysis of each rescued plasmid DNA demonstrated that these same two cDNA 
clones possessed high-titered transforming activity of around 10^ ffu/nmol DNA, while none of the other plasmids res- 

35 cued from the same transformants showed detectable activity. 

To determine whether other transformants induced by the cDNA library contained the 0T18 oncogene, genomic 
DNA extracted from each primary transformant was digested witii Sail to release cDNA inserts from the vector and sub- 
jected to Southern blotting analysis using the 0T18 cDNA insert as a probe. Since Sail is an Infrequent cutter for mam- 
malian DNA. genomic DNA was cleaved to very large fragments which remained near the origin of the gel. Thus, ttie 

AO relatively small cDNA fragments released from cellular DNAs by Sail cleavage could be separated from the bulk of 
genomic fragments. As shown in Figure 8, each of the cDNA library transformants contained the 0T1 8 oncogene cDNA 
insert. The sizes ranged from around 2.3 kb to 7 kb. suggesting that serveral of the inserts represented independent 
cDNA clones of tiie oncogene. 

45 3.4 Structural Analysis of T1 8 oncogene cDNA 

A detailed resti-iction map of tiie 2.1 kb insert of one of tiie transforming plasmids was constructed, and the clone 
was subjected to sequence analysis. A database search indicated that that the 5' portion of tiie cDNA contained an 
unknown sequence, while the 3' region was dosely related to human B-raf (Ikawa et al., (1988) MoL Cell. Biol. 8. 2651- 

50 2654). and chicken Rmil (Marx et al.. (1988) EMBO 7, 3369-3373) genes (Figure 9a). Comparison of the predicted 
amino acid sequences witii that of B-raf indicated identity (Figure 9) with the exception of a single amino acid difference 
at position 324. in which Gly was substituted for Ala in human B-raf. 

There was also complete identity witii avian R-m// except for a small stretch of nine amino acids at tiie R-m/V C- 
terminus, where recombination with an avian retroviral env gene caused this substitution. 

55 To determine tiie breakpoint, we also compared the T1 8 nucleotide sequence witii that of proto B-raf and v-R-miL 
There was no homology with either sequence upstream from position 1040 in tiie T18 oncogene. TTius, position 1040 
represents tiie junction between an unknown sequence and the B-raf gene. R-mil is a viral onocogene and encodes a 
gag-B-mH-env fusion protein. The junction of gag and R-m/7 has been mapped 144 nucleotide upstream from the T-18 
break point (l^arx et al., 1988), while tiie junction of a different sequence and the human B-raf oncogene was 174 
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nudeotides upstream from the junction in the T1 8 oncogene. In each of the B-raf oncogenes, including T1 8. the break- 
points did not disrupt the predicted kinase domain of the protein. 

3.5 Evidence for in Viv Oncogene Activati n 

The human B-raf gene product is around 84% related to the amino acid sequence of the c-raf oncogene. Another 
member of the raf family, A-raf, is also structurally similar. Most raf oncogenes have been activated by mechanisms 
involving structural rearrangements due to recombination and loss of amino terminal sequences of the raf coding 
sequence (Rapp. U.R.. et al.. (1988) In: Reddy, E.R (ed.) The Oncogene Handbook. Elsevier Science Publishers B.V.). 
Moreover, most reported instances have involved in vitro activation of these genes during DNA transfection rather than 
by mechanisms leading to oncogene activation within the tumor itseH (Ishikawa, F„ et. al., (1987) T. Mol. Cell. BioL, 7, 
1226); (Stanton. V.R and Cooper. G.M. (1987) Mol Cell. BioL 7. 1 171); (Ikawa. S.. et al., (1988) Mol. Cell. BioL 8. 2651 - 
2654).' The rearrangement activating the B-raf oncogene might have occurred during the course of cDNA library con- 
struction or as an artifact of DNA transfection with the original tumor DNA. Alternatively, the oncogene might have been 
activated within the original tamor itself. 

While original tumor DNA was not available, it was possible to analyze two primary transformants which had been 
independently Induced by this tumor DNA. As shown in Figure 10, rearrangement as well as amplification of both non- 
B-raf and related B-raf portions of the gene were found not only in the secondary transfectant, which was the source of 
the cDNA library, but in both primary transformants. PT18-1 and PT18-2. Since such in vitro rearrangements are very 
rare, these findings strongly argue that the oncogene was activated in vivo in the hepatoma as part of the neoplastic 
process. 

3.6 Detection of the mRNAs for the T18 Oncogene. 

In an effort to characterize the B-raf oncogene transcript and search for evidence of additional B-raf oncogenes 
among the 3 other hepatoma oncogenes, we subjected polyA selected RNAs from primary or secondary N1H/3T3 
transfectants containing each oncogene to analysis with DNA probe from 5' (B-raf unrelated) and 3' (B-raf) portions of 
the T18 oncogene. As shown in Fig. 1 1 , control NIH/3T3 cells contained a 4.2 kb RNA that hybridized with the 5' non- 
B-raf related portion of the oncogene but there was no detectable B-raf transcript. In contrast, the second cycle T18 
transfectant, which was the source of expression cDNA library, showed a major 4.2-kb as well as minor 10- and 3-kb 
transcripts which appeared to hybridize with both probes. 

Fig. 1 1 further shows that a primary T23 oncogene transfectant contained multiple B-raf hybridizing transcripts, 
indicating that it also contained another B-raf oncogene. Of note, the several transcripts detected differed in their sizes 
from those of the T1 8 oncogene. Moreover, none of these transcripts was detected by the B-raf probe of the T18 onco- 
gene (Fig. 11). TTius, if the T23 oncogene arose by a mechanism involving B-raf gene rearrangement, this rearrange- 
ment was different from that associated with activation of the T18 oncogene. The transfectant induced by the T28 
oncogene (Fig. 1 1) did not show abnortmal B-raf hybridizing RNAs, arguing that this oncogene must be distinct from B- 
raf. 

Example 4. cDNA Expression Cloning of the Keratinocyte Growth Factor Receptor by Complementation for 
Autocrine Transformation 

4.1 Identification of epithelial cell cDNAs capable of transforming NIH/3T3 cells. 

We prepared a cDNA library from BALB/MK epidermal keratinocytes (Weissman. B.E, & Aaronson, S.A. Cell 32, 
599-606 (1983)) using the automatic directional cloning (Miki, T et al.. Gene 83:137-146. (1989)) in an improved 
expression vector lpCEV27 (Miki. unpublished data). A library of 4.5 x 10^ independent clones was amplified, phage 
particles purified, and their DNA extracted. DNA transfection of NIH/3T3 mouse embryo fibroblasts (Jainchill, J.L et al., 
J. Virol 4. 549-553 (1969)). which synthesize KGF, was performed by the calcium phosphate technique (Wigler, M. et 
al. Cell 1 1 , 223-232 (1977)). Individual plates were examined at 10-18 days for the appearance of transformed foci. We 
detected 15 foci among a total of 100 individual cultures transfected with 5 mg library cDNA/plate. Each focus was 
tested and shown to be resistant to G418. indicating that it contained integrated vector sequences. Three representa- 
tive transformants were chosen for more detailed characterization based upon differences in their morphologies (Rg. 

13). . . 

When we performed plasmid rescue, each transformant gave rise to at least 3 distinct cDNA clones as determined 
by physical mapping. To examine their biological activities, each clone was subjected to transfection analysis on 
NIH/3T3 cells. A single clone r scued fr m each transformant was found to possess high-titered transforming activity 
ranging from 10^-10^ focus forming units/nmole DNA. Moreover, th morphology of foci induced by each cDNA was 
similar to that of the parental transformant. Because of their distinct physical maps and distinguishable biological prop- 
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erties, we tentatively designated the genes for these transforming cDNAs as epithelial cell transforming (ect) 1 , 2 and 
3. Transfectants induced by the individual transforming plasmids were utilized in subsequent analyses. 

4.2 Specific KGF binding by ecti -transformed cells. 

Suramin is an agent known to interfere with ligand-receptor Interactions including the binding of PDGF (Fleming. 
TP. et al. Proc. Natn. Acad. Sci. U.S.A. 86. 8063-8067 (1989); Belsholtz, C. et al. Proc. Natn. Acad. Sci. U.S.A. 83, 
6440-6444 (1986)) and KGF with their respective reciters. When an ecti transfectant was exposed to suramin, its pro- 
liferation was markedly inhibited, associated with reversion of the transformed phenotype (data not shown). To further 
investigate the possibility that ect1 might encode the KGF receptor, we performed binding studies with recombinant 
[^^^Q-KGF as the tracer molecule. As shown in Fig. 14, BALB/MK cells demonstrated specific high affinity binding of 
[^^^q-KGF while there was no such binding detectable to NII-1/3T3 cells. Of note, expression of the ecti gene by 
NIH/3T3 cells resulted in the acquisition of 3.5-fbld more [''^Sij-kGF binding sites than BALB/MK cells (Fig. 13). Under 
these same conditions, neither NIH/ect2 nor ect3 bound significant levels of the labelled growth factor. These results 
strongly suggested that ect1 encoded the KGF receptor, whose introduction into NIH/3T3 cells had completed an auto- 
crine transforming loop. 

4.3 Molecular characterization of ecti. 

To characterize ect1. the transforming 4.2kb cDNA released by Sail digestion was used as a molecular probe to 
hybridize Sal I restricted genomic DMAs of NIH/3T3 as well as NIHy3T3 transfectants containing ecti . ect2 or ect3. While 
the expected 4.2kb DNA fragment was detected in the ect1 transfbrmant (Fig. 1 5a). neither NIHy3T3 nor the other trans- 
fectants showed evidence of a Sail fragment hybridized by the cDNA insert. These results further argued that the ect2 
and ectS represented independent transforming genes. When EcoRI was used to cleave normal mouse DNA. we 
observed several distinct ecti hybridizing DNA fragments, which reflected endogenous ecti sequences or closely 
related genes (Fig. 15b). Ect1 related sequences were also demonstrated in the DNAs of other species analyzed, 
including human, indicating its high degree of consen/ation in vertebrate evolution. 

When we analyzed expression of transcripts related to ecti in BALB/MK and Nlhl/3T3 cells, we observed a single 
transcript of around 4.2 kb in BALB/MK cells (Fig. 15c). Thus, our cDNA clone represented essentially the complete 
ecti transcript. In NIH/3T3 cells, a transcript of comparable size was only faintly detectable under relatively stringent 
hybridization conditions. We estimated that its expression was several fold lower than the level of the 4.2 transcript in 
BALB/MK cells. Thus, if this transcript were to represent ecti rather than a related gene, its expression was markedly 
lower in fibroblasts as compared to epithelial cells. 

4.4 Ecti encodes a transmembrane tyrosine kinase of the FGF receptor family. 

We next determined the nucleotide sequence of the ect1 4.2kb cDNA insert. Analysis of the predicted amino acid 
sequence revealed a long open reading frame of 2235 nucleotides (nucleotide position 562-2796). Two methionine 
codons were found at nucleotide positions 61 9 and 676, respectively (Fig. 1 6a). The second methionine codon perfectly 
matched the Kozak's consensus for a translational initiator sequence (A/GCCATGG) (Kozak, M. Nucleic Acids Res. 15, 
8125-8148 (1 987)). Moreover, it was followed by a characteristic signal sequence of 21 residues of which 10 were iden- 
tical to those of the putative signal peptide of the mouse bFGF receptor (Reid, H.H.. et al.. Proc. Natn. Acad. Sci U.S.A. 
87. 1596-1600 (1990); Pasquale. E.B. & Singer. S.J. Proc. Natn. Acad. Sci. U.S.A. 86. 8722-8726 (1989): (Safran. A. et 
al. Oncogene 5, 635-643 (1990)). Thus, it seems likely that the second ATG is the authentic initiation codon for the KGF 
receptor (KGFR). If so. the receptor polypeptide would conrtprise 707 amino acids with a predicted size of 82.5 kd. 

The amino acid sequence of the KGFR predicted a transmembrane tyrosine kinase most closely related to the 
mouse bFGF receptor (bFGFR). The percent similarity between both proteins is shown in Fig. 16b. The putative KGFR 
extracellular portion contained two immunoglobulin (lgG)-like domains, exhibiting 77% and 60% similarity with the IgG- 
like domains 2 and 3, respectively, of the mouse bFGFR, Recent studies have revealed a variant form of tiie bFGFR, 
whose extracellular domain also contains only these two corresponding IgG-like domains (Reid, H.H.. et al., Proc. Natn. 
Acad. Sci U.S.A. 87, 1596-1600 (1990). The sequence N-terminal to the first IgG-like domain of the KGFR was 63 res- 
idues long in comparison to 88 residues found in tiie shorter form of the mouse bFGFR. Both the chicken and mouse 
bFGFRs contain a series of eight consecutive acidic residues between the first and second IgG-like domains (Reid. 
H.H., etal., Proc. natn. Acad. Sci U.S.A. 87, 1596-1600 (1990); Pasquale, E.B. & Singer. S.J. Proc. Natn. Acad. Sci. 
U.S.A. 86. 8722-8726 (1989); (Safran. A. et al. Oncogene 5. 635-643 (1990); (Lee. RL et al. Science 245, 57-60 
( 1 989)). This sequence is even retained in the shorter form of tiie nrK>use bFGFR. which lacks the first IgG-like domain 
(Fig. 16b). However, the KGFR did not contain such an acidic domain. Whether this reflects significant functional differ- 
ences between thes receptors remains to b determined. 

The intracellular portion of the KGFR was highly homologous to the bFGFR tyrosine kinase (Fig. 16). The central 
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core of the catalytic domain was flanked by a relatively long juxtamenfibrane sequence, and the tyrosine kinase domain 
was split by a short insert of 14 residues, similar to that observed in avian, human and murine bFGF receptors (Reid. 

H. H.. et al.. Proc. natn. Acad. Sci U.S.A. 87. 1596-1600 (1990): Pasquale. E.B. & Singer. S.J. Proc, Natn. Acad. Sd. 
U.S.A. 86, 8722-8726 (1989); (Safran, A. et al. Oncogene 5. 635-643 (1990); (L e, P.L et al. Science 245. 57-60 
(1989): (Ruta. M. et al. Oncogene 3. 9-15 (1988); and (Ruta, M. et al. Proc. Natn. Acad. Sci. U.S.A. 86. 5449-5434 
(1989)). Hanafusa and cc-workers isolated a partial cDNA encoding a tyrosine kinase, designated bek. by bacterial 
expression cloning using phosphotyrosine antibodies (Kornbluth. S., et al., Molec. Cell Biol. 8. 5541-5544 (1988)). The 
reported sequence of bek was identical to the KGFR in the tyrosine kinase domain (Fig. 16b). Although only partical 
sequence of bek is available, it is very likely to encode the mouse KGF receptor. 

4.5 Functional analysis of the cloned KGF receptor. 

Because of the existence of more than one receptor of the FGF family (Reid, H.H., et al. Proc. Natn. Acad. Sci. 
U.S.A. 87. 1 596-1 600 (1 990), we sought to characterize in detail the binding properties of the KGF receptor isolated by 
expression cloning. Scatchard analysis of [""^^IJ-KOF binding to the NIH/ectI transfectant revealed expression of two 
similar high affinity receptor populations. Out of a total of -3.8 x 1 0® sites/cell, 40% displayed a Kd of 1 80 pM. while the 
remaining 60% showed a Kd of 480 pM (data not shown). These values are comparable to the high affinity KGF recep- 
tors displayed by BALB/MK cells. The pattern of KGF and FGF competition for [^^^IJ-KGF binding to NIH/ectI cells was 
also very similar to that observed with BALB/MK cells (Fig. 17). Although maximum [^^^l]-KGF binding to NIH/ectI cells 
was 3.5 fold higher than to BALB/MK. there was 50% displacement by 2 ng/ml of either KGF or aFGF with each cell 
type. Similarly, both cells showed 15-fold less efficient competition by bFGF for bound [^^^IJ-KGR Together with obser- 
vations that parental NIH/STS cells lack detectable specific [^^^q-KGF binding (Fig. 14), these results demonstrate that 
the cloned KGF receptor expressed in NIHy3T3 cells conferred the characteristic pattern of KGF and FGF competition 
displayed by BALB/MK cells. 

When [^2^I]-KGF is crosslinked to its receptors on BALB/MK cells, two protein species of 1 65 and 1 37kd have been 
observed. Taking into account the size of KGF itself, we have estimated the corresponding receptor species to be 
around 140 and 115kd, respectively. When [^^^IJ-KGF crosslinking was performed with NIH/ectI cells, we observed a 
single species corresponding in size to the smaller, 137kd species in BALB/MK cells (Fig. 17a). Moreover, detection of 
this band was specifically and efficiently blocked by unlabelled KGF When glycosylation is considered, the size of the 
KGF receptor predicted by sequence analysis corresponds reasonably well with the corrected size (115 kd) of the 
crosslinked KGF receptor in the ect1 transfectant. 

As a final test of the functional nature of the KGF receptor expressed in NIH/ectI cells, we investigated its capacity 
to induce tyrosine phosphorylation of cellular proteins. Thus, intact NIH/ectI cells were exposed to KGF for 10 min, and 
cell lysates were subjected to immunoprecipitation and immunoblotting analysis utilizing ant'-phosphotyrosine (anti- 
Ptyr) antibody /^s shown in Fig. 17b. several putative substrates were tyrosine phosphorylated in response to KGF 
addition. These included pp55. pp65, pp90. ppl 15. pp150 and pp190. Previous studies have indicated that similar size 
proteins are phosphorylated in response to KGF triggering of BALB/MK cells. Moreover, the 115-kd phosphoprotein 
matches the corrected size of the KGF receptor crosslinked by [""^^IJ-KGR Thus, it may reflect tiie autophosphorylated 
KGF receptor itself. 

For purposes of completing the background description and present disclosure, each of the published articles, pat- 
ents and patent applications heretofore identified in this specification are hereby incorporated by reference into the 
specification. 

The foregoing invention has been described in some detail for purposes of clarity and understanding. It will also be 
obvious that various combinations in form and detail can be made without departing from the scope of the invention. 

One skilled in the art will appreciate that the capadty for efficient rescue of cDNA clones from mammalian cells is 
an important function of a stable mammalian expression cloning system. When plasmid cDNA libraries are used to 
transfect mammalian cells, single plasmids integrated in genomic DNA are difficult to release. Plasmid rescue is readily 
achieved only when multiple copies are clustered at a single integration site (Noda et al. PNAS. 86. 162-166. 1989). 
Excision of tiie plasmid by induction of replication from tiie SV40 origin using COS cell fusion often results in rean^nge- 
ment or truncation of cDNA clones eff icientiy from stable integration sites within mammalian host cells. Applicants used 
a sti'ategy involving -plasmid composite vectors. The vectors contain plasmid excision sites for multiple cutters including 
Not1, MLul and Xhol. This allows the intact plasmid containing insert to be efficiently rescued with low probability of 
internal cleavage of the insert. 

Claims 

I . A DNA segment encoding a keratinocyte growth factor receptor. 

2. The DNA segment according to claim 1 wherein said receptor has tiie amino acid sequence given in Figure 15a, 
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or allelic or species variations thereof. 

3. A DNA construct comprising a vector and a DNA segment encoding a keratinocyte growth factor receptor. 

4. A DNA construct comprising a vector and a DNA segment encoding a keratinocyte growth factor receptor, wherein 
the vector is a genetic cloning vector comprising at least one replicon; 

a site for inserting a DNA segment to be cloned that includes at least two non-symmetrical restriction enzyme 
recognition sequences that can be cleaved by a single restriction enzyme: and 

at least two regulatory elements located in relation to said site for insertion such that, when a DNA segment is 
inserted into said site, transcription of said DNA segment can be effected in both the sense and antisense 
directions. 

5. The construct according to claim 3 wherein the vector is pCEV27. 

6. A host cell comprising the construct according to claim 3. 

7. A keratinocyte growth factor receptor substantially free of proteins with which it is naturally associated. 

8. The receptor according to daim 7 wherein said rec^tor has a higher affinity for keratinocyte growth factor and 
acidic fibroblast growth factor than basic f broblast growth factor. 

9. The receptor according to claim 8 wherein said receptor has the sequence shown in Figure 15a, 

10. A process of producing a keratinocyte growth factor receptor comprising cuHuring the cells according to daim 6 
under conditions such that said DNA segment is expressed and said factor thereby produced. 

1 1 . A method of identifying the presence in a DNA sequence of a gene the protein product of which confers a pheno- 
typically identifiat>Ie trait comprising: 

i) preparing a DNA expression library containing said DNA sequence in a cloning vector, in a manner such that 
said gene is represented in a said library; 

ii) introdudng said library into a population of cells under conditions such that integration into the genome of 
said cells and expression of said gene are effected, so that said phenotypically identifiable trait is caused to be 
displayed; 

iii) isolating DNA from said cells of said population displaying said phenotypically identifiable trait; and 

iv) excising a DNA segment containing said gene from said integrated DNA, wherein said doning vector is one 
of the following: 

a) a genetic cloning vector comprising at least one replicon; and 

a site for inserting DNA segments to be doned that includes at least two non-symmetrical restriction 
enzyme recognition sequences, wherein 

at least two of said non-symmetrical restriction enzyme recognition sequences are nonidentical, said 
vector further comprising regulatory elements that are located in relation to said site for insertion of 
DNA segments such that, when a DNA segment is inserted into said site, at least a portion of the 
sequence of said DNA segment is transcribed; or 

b) a genetic cloning vector comprising 

at least one replicon; 

a site for inserting a DNA segment to be cloned that includes at least two non-symmetrical restriction 
enzyme recognition sequences that can be cleaved by a single restriction enzyme; and 
at least two regulatory elements located in relation to said site for insertion such that, when a DNA 
segment is inserted into said site, transcription of said DNA segment can be effected in both the sense 
and antisense directions. 

12. The method according to claim 1 1 wherein said DNA sequence is a cDNA sequence. 
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13. The method according to claim 1 1 wherein said gene encodes a ligand, a receptor for which is normally produced 
by ceils of said population, said cells, prior to introduction of said gene, being incapable of producing said ligand. 

14. The method according to claim 1 1 wherein said gene encodes a receptor, the ligand which binds thereto being nor- 
5 mally produced by cells of said population, said cells, prior to introduction of said gene, being incapable of produc- 
ing said receptor. 

15. The method according to claim 1 1 wherein said phenotypically identifiable trait is uncontrolled proliferation. 
10 16. The method according to claim 14 wherein said receptor is the keratinocyte growth factor receptor. 

17. The method according to claim 1 1 wherein said phenotypically identifiable trait is drug resistance. 

18. A DNA segment having a sequence that encodes the amino acid sequence shown in Figure 9b. 

15 

19. A DNA segment according to claim 18 wherein said segment has the sequence shown in Rgure 9b. 
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